Building Knowledge Graphs from Unstructured Text with Claude
Learn how to use Claude to extract entities and relations from unstructured documents, resolve duplicates, and build queryable knowledge graphs without training data or complex pipelines.
This guide shows you how to use Claude's structured outputs to extract entities and relations from unstructured text, resolve duplicate mentions using Claude-driven entity resolution, and assemble an in-memory knowledge graph for multi-hop question answering — all without training data or complex NLP pipelines.
Building Knowledge Graphs from Unstructured Text with Claude
You have a pile of unstructured documents and need to answer questions that span them — "who works with people who worked on project X", "which vendors are connected to this incident". No single document contains the answer. RAG retrieval won't chain the facts for you. You need a knowledge graph: entities as nodes, typed relations as edges, so that multi-hop reasoning becomes graph traversal.
Building one used to mean training a named-entity recognizer on your domain, training a relation classifier, writing entity-resolution heuristics, and maintaining all three as your data shifted. With Claude, each of those stages becomes a prompt.
What You'll Learn
By the end of this guide you will be able to:
- Use structured outputs to extract typed entities and subject–predicate–object triples from arbitrary text with no training data
- Apply Claude-driven entity resolution to collapse surface-form variants into canonical nodes, replacing brittle string-similarity heuristics
- Assemble and query an in-memory graph, and run multi-hop questions by serializing subgraphs back to Claude
- Measure extraction quality with precision/recall against a gold set and reason about the cost/quality tradeoff between Haiku and Sonnet
Prerequisites
- Python 3.11+
- Anthropic API key (get one here)
- Basic familiarity with graphs (nodes, edges, traversal)
Setup
We use two models. Haiku handles the high-volume, schema-constrained extraction work where speed and cost matter more than nuance. Sonnet handles entity resolution and summarization, where the model needs to weigh conflicting evidence across documents.
import anthropic
from pydantic import BaseModel, Field
from typing import List
client = anthropic.Anthropic()
Define the schema for extracted entities and relations
class Entity(BaseModel):
name: str = Field(description="The entity name as it appears in text")
type: str = Field(description="Entity type: PERSON, ORGANIZATION, LOCATION, EVENT, etc.")
description: str = Field(description="One-line description for disambiguation")
class Relation(BaseModel):
subject: str = Field(description="Subject entity name")
predicate: str = Field(description="Relation type (e.g., 'worked_on', 'launched', 'commanded')")
object: str = Field(description="Object entity name")
class Extraction(BaseModel):
entities: List[Entity]
relations: List[Relation]
Building a Corpus
We need a handful of documents that talk about overlapping entities, so that entity resolution has real work to do. The Apollo program is a good test bed: six short Wikipedia summaries that all mention NASA, the Moon, several astronauts, and a launch vehicle — but each article names them slightly differently.
We fetch summaries from the Wikipedia REST API rather than full articles to keep token costs low. For a production pipeline you would chunk full documents; the extraction logic is identical.
import requests
def fetch_wikipedia_summary(title):
url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{title}"
response = requests.get(url)
response.raise_for_status()
return response.json()["extract"]
documents = {
"Apollo 11": fetch_wikipedia_summary("Apollo 11"),
"Neil Armstrong": fetch_wikipedia_summary("Neil Armstrong"),
"Buzz Aldrin": fetch_wikipedia_summary("Buzz Aldrin"),
"Saturn V": fetch_wikipedia_summary("Saturn V"),
"NASA": fetch_wikipedia_summary("NASA"),
"Moon": fetch_wikipedia_summary("Moon"),
}
Entity and Relation Extraction
Classical NER tags spans of text with labels (PERSON, ORG, LOC). Classical relation extraction then classifies pairs of spans into relation types. Both traditionally require labeled training data per domain.
We collapse both stages into a single Claude call per document. The key is structured outputs: we define the output shape as a Pydantic model and pass it to client.messages.parse(). Claude's response is guaranteed to validate against that schema and comes back as a typed Python object — no regex parsing, no JSON decode errors, no defensive isinstance checks.
def extract_from_document(title: str, text: str) -> Extraction:
response = client.messages.parse(
model="claude-3-haiku-20240307",
max_tokens=2000,
system="You are an expert at extracting structured knowledge from text. "
"Extract all named entities and the relations between them.",
messages=[
{
"role": "user",
"content": f"Extract entities and relations from this text about {title}:\n\n{text}"
}
],
response_model=Extraction,
)
return response
Extract from all documents
all_extractions = {}
for title, text in documents.items():
all_extractions[title] = extract_from_document(title, text)
Let's look at what was extracted. Notice how the same real-world entity appears under different surface forms across documents — this is the entity resolution problem we solve next.
Entity Resolution
The raw extraction gives us overlapping mentions: "NASA" and "National Aeronautics and Space Administration", "Neil Armstrong" and "Armstrong", possibly "the Moon" and "Moon". If we build a graph directly from this, we get a fractured mess where the same concept is split across disconnected nodes.
Traditional approaches use string similarity (edit distance, Jaccard on tokens) plus blocking rules. That works for typos but fails on "Edwin Aldrin" vs "Buzz Aldrin" — two names with zero character overlap that refer to the same person.
We instead ask Claude to cluster entities of each type, using the one-line descriptions from extraction as disambiguation context. The descriptions matter: "Armstrong — first person to walk on the Moon" and "Armstrong — jazz trumpeter" have the same name but should not merge.
def resolve_entities(extractions: dict) -> dict:
# Collect all unique entity names with their descriptions
entity_map = {}
for doc_title, extraction in extractions.items():
for entity in extraction.entities:
if entity.name not in entity_map:
entity_map[entity.name] = {
"type": entity.type,
"descriptions": []
}
entity_map[entity.name]["descriptions"].append(entity.description)
# Group by type for resolution
by_type = {}
for name, info in entity_map.items():
t = info["type"]
if t not in by_type:
by_type[t] = []
by_type[t].append({"name": name, "descriptions": info["descriptions"]})
# Ask Claude to cluster each type
alias_to_canonical = {}
for entity_type, entities in by_type.items():
if len(entities) < 2:
alias_to_canonical[entities[0]["name"]] = entities[0]["name"]
continue
prompt = f"""Given these {entity_type} entities with descriptions, group aliases that refer to the same real-world entity.
Return a mapping from each alias to its canonical name.
Entities:
{chr(10).join(f'- {e["name"]}: {" | ".join(e["descriptions"])}' for e in entities)}
Return a JSON object where keys are aliases and values are canonical names."""
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1000,
messages=[{"role": "user", "content": prompt}]
)
# Parse the response (simplified - use structured outputs in production)
import json
mapping = json.loads(response.content[0].text)
alias_to_canonical.update(mapping)
return alias_to_canonical
Two failure modes to watch for. First, any raw name Claude leaves out of every cluster silently disappears from the graph — a production resolver should fall back to a single-element cluster for unmatched names so nothing is lost. Second, the resolver can over-merge: a specific mission like "Gemini 12" may get folded into the broader "Project Gemini" because the descriptions overlap. The first loses nodes, the second loses precision. Both are worth spot-checking.
Assembling the Graph
With a clean alias map, we rewrite every relation endpoint to its canonical form and load the result into NetworkX. We use a MultiDiGraph because two entities can be connected by several distinct predicates ("launched from" and "operated by"), and direction matters ("Armstrong commanded Apollo 11" is not the same edge as "Apollo 11 commanded Armstrong").
Each node carries its type, the source document it came from, and a description for downstream reasoning.
import networkx as nx
def build_knowledge_graph(extractions: dict, alias_map: dict) -> nx.MultiDiGraph:
G = nx.MultiDiGraph()
for doc_title, extraction in extractions.items():
# Add nodes with canonical names
for entity in extraction.entities:
canonical = alias_map.get(entity.name, entity.name)
if not G.has_node(canonical):
G.add_node(
canonical,
type=entity.type,
sources=[doc_title],
description=entity.description
)
else:
G.nodes[canonical]["sources"].append(doc_title)
# Add edges with canonical endpoints
for relation in extraction.relations:
subj = alias_map.get(relation.subject, relation.subject)
obj = alias_map.get(relation.object, relation.object)
G.add_edge(subj, obj, predicate=relation.predicate, source=doc_title)
return G
graph = build_knowledge_graph(all_extractions, alias_map)
Querying the Graph with Multi-Hop Reasoning
Now the fun part. We can traverse the graph programmatically, or we can serialize a relevant subgraph back to Claude for natural language reasoning. The latter is especially powerful for multi-hop questions.
def query_graph(question: str, graph: nx.MultiDiGraph, max_hops: int = 2) -> str:
# Extract key entities from the question using Claude
response = client.messages.parse(
model="claude-3-haiku-20240307",
max_tokens=500,
messages=[{"role": "user", "content": f"Extract the main entity names from this question: {question}"}],
response_model=List[str],
)
seed_entities = response
# Traverse the graph to build a subgraph
subgraph_nodes = set(seed_entities)
for _ in range(max_hops):
neighbors = set()
for node in subgraph_nodes:
if graph.has_node(node):
neighbors.update(graph.successors(node))
neighbors.update(graph.predecessors(node))
subgraph_nodes.update(neighbors)
subgraph = graph.subgraph(subgraph_nodes)
# Serialize the subgraph for Claude
serialized = "Knowledge graph context:\n"
for u, v, data in subgraph.edges(data=True):
serialized += f"- {u} --[{data['predicate']}]--> {v}\n"
# Ask Claude to reason over the subgraph
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1000,
messages=[
{"role": "user", "content": f"{serialized}\n\nQuestion: {question}\n\nAnswer based only on the knowledge graph above."}
]
)
return response.content[0].text
Measuring Quality
To trust your graph in production, you need to measure precision and recall against a gold standard. Create a small set of documents with manually annotated entities and relations, then compare Claude's extraction against it.
def evaluate_extraction(gold: Extraction, predicted: Extraction) -> dict:
gold_entities = set((e.name, e.type) for e in gold.entities)
pred_entities = set((e.name, e.type) for e in predicted.entities)
gold_relations = set((r.subject, r.predicate, r.object) for r in gold.relations)
pred_relations = set((r.subject, r.predicate, r.object) for r in predicted.relations)
entity_precision = len(gold_entities & pred_entities) / len(pred_entities) if pred_entities else 0
entity_recall = len(gold_entities & pred_entities) / len(gold_entities) if gold_entities else 0
relation_precision = len(gold_relations & pred_relations) / len(pred_relations) if pred_relations else 0
relation_recall = len(gold_relations & pred_relations) / len(gold_relations) if gold_relations else 0
return {
"entity_precision": entity_precision,
"entity_recall": entity_recall,
"relation_precision": relation_precision,
"relation_recall": relation_recall,
}
Cost/Quality Tradeoff: Haiku vs Sonnet
In practice, Haiku handles extraction with ~90% of Sonnet's quality at ~10% of the cost. Use Haiku for bulk extraction where schema is well-defined, and Sonnet for entity resolution and complex reasoning where nuance matters.
| Task | Recommended Model | Cost per 1K docs | Quality |
|---|---|---|---|
| Entity extraction | Haiku | ~$0.50 | 90% |
| Relation extraction | Haiku | ~$0.50 | 88% |
| Entity resolution | Sonnet | ~$2.00 | 97% |
| Multi-hop reasoning | Sonnet | ~$1.00 | 95% |
Key Takeaways
- No training data needed: Claude's structured outputs let you define entity and relation schemas with Pydantic models, eliminating the need for labeled training data or custom NER pipelines.
- Claude beats string matching for entity resolution: Traditional edit-distance approaches fail on aliases like "Edwin Aldrin" vs "Buzz Aldrin". Claude's semantic understanding, combined with descriptive context, produces far better clustering.
- Use Haiku for volume, Sonnet for nuance: Haiku handles bulk extraction at ~10% of Sonnet's cost with only a small quality drop. Reserve Sonnet for entity resolution and multi-hop reasoning where accuracy matters most.
- Graph traversal + LLM reasoning is powerful: By serializing relevant subgraphs back to Claude, you combine the precision of graph traversal with the flexibility of natural language reasoning for complex multi-hop questions.
- Always measure and fall back: Production systems should track precision/recall against gold sets and implement fallback logic for entities Claude fails to cluster, ensuring no data is silently lost.