Guide2026-04-27

Building a Knowledge Graph from Unstructured Text with Claude

Learn how to use Claude to extract entities and relations from unstructured text, resolve duplicates, and build a queryable knowledge graph for multi-hop reasoning — no training data required.

Quick Answer

This guide shows you how to use Claude's structured outputs to extract typed entities and relations from unstructured text, resolve duplicate mentions with Claude-driven entity resolution, and assemble an in-memory knowledge graph for multi-hop question answering — all without training data or a database.

Knowledge GraphEntity ExtractionStructured OutputsEntity ResolutionClaude API

Building a Knowledge Graph from Unstructured Text with Claude

You have a pile of unstructured documents and need to answer questions that span them — "who works with people who worked on project X", "which vendors are connected to this incident". No single document contains the answer. RAG retrieval won't chain the facts for you. You need a knowledge graph: entities as nodes, typed relations as edges, so that multi-hop reasoning becomes graph traversal.

Building one used to mean training a named-entity recognizer on your domain, training a relation classifier, writing entity-resolution heuristics, and maintaining all three as your data shifted. With Claude, each of those stages becomes a prompt.

What You'll Learn

By the end of this guide you will be able to:

Use structured outputs to extract typed entities and subject–predicate–object triples from arbitrary text with no training data
Apply Claude-driven entity resolution to collapse surface-form variants into canonical nodes, replacing brittle string-similarity heuristics
Assemble and query an in-memory graph, and run multi-hop questions by serializing subgraphs back to Claude
Measure extraction quality with precision/recall against a gold set and reason about the cost/quality tradeoff between Haiku and Sonnet

Everything runs in memory with no database. The techniques transfer directly to Neo4j, Neptune, or a Postgres adjacency table when you need to scale.

Prerequisites

Python 3.11+
Anthropic API key (get one here)
Basic familiarity with graphs (nodes, edges, traversal)

Setup

We use two models. Haiku handles the high-volume, schema-constrained extraction work where speed and cost matter more than nuance. Sonnet handles entity resolution and summarization, where the model needs to weigh conflicting evidence across documents.

import anthropic
from pydantic import BaseModel, Field
from typing import List, Optional
import networkx as nx
client = anthropic.Anthropic()
HAIKU = "claude-3-haiku-20240307"
SONNET = "claude-3-sonnet-20240229"

Building a Corpus

We need a handful of documents that talk about overlapping entities, so that entity resolution has real work to do. The Apollo program is a good test bed: six short Wikipedia summaries that all mention NASA, the Moon, several astronauts, and a launch vehicle — but each article names them slightly differently.

import requests
def fetch_wikipedia_summary(title):
    url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{title}"
    response = requests.get(url)
    response.raise_for_status()
    return response.json()["extract"]
documents = {
    "Apollo 11": fetch_wikipedia_summary("Apollo 11"),
    "Neil Armstrong": fetch_wikipedia_summary("Neil Armstrong"),
    "Buzz Aldrin": fetch_wikipedia_summary("Buzz Aldrin"),
    "Saturn V": fetch_wikipedia_summary("Saturn V"),
    "NASA": fetch_wikipedia_summary("NASA"),
    "Moon": fetch_wikipedia_summary("Moon"),
}

We fetch summaries from the Wikipedia REST API rather than full articles to keep token costs low. For a production pipeline you would chunk full documents; the extraction logic is identical.

Entity and Relation Extraction

Classical NER tags spans of text with labels (PERSON, ORG, LOC). Classical relation extraction then classifies pairs of spans into relation types. Both traditionally require labeled training data per domain.

We collapse both stages into a single Claude call per document. The key is structured outputs: we define the output shape as a Pydantic model and pass it to client.messages.parse(). Claude's response is guaranteed to validate against that schema and comes back as a typed Python object — no regex parsing, no JSON decode errors, no defensive isinstance checks.

class Entity(BaseModel):
    name: str = Field(description="The canonical name of the entity")
    type: str = Field(description="Entity type: PERSON, ORGANIZATION, LOCATION, EVENT, VEHICLE, etc.")
    description: str = Field(description="A one-line description for disambiguation")
class Relation(BaseModel):
    subject: str = Field(description="Name of the subject entity")
    predicate: str = Field(description="Relation type in present tense, e.g., 'commanded', 'launched_from'")
    object: str = Field(description="Name of the object entity")
class Extraction(BaseModel):
    entities: List[Entity]
    relations: List[Relation]
def extract_from_text(text: str, model: str = HAIKU) -> Extraction:
    response = client.messages.parse(
        model=model,
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": f"Extract all entities and their relations from this text.\n\n{text}"
        }],
        response_model=Extraction,
    )
    return response
Extract from all documents
all_extractions = {}
for title, text in documents.items():
    all_extractions[title] = extract_from_text(text)

Let's look at what was extracted. Notice how the same real-world entity appears under different surface forms across documents — this is the entity resolution problem we solve next.

Entity Resolution

The raw extraction gives us overlapping mentions: "NASA" and "National Aeronautics and Space Administration", "Neil Armstrong" and "Armstrong", possibly "the Moon" and "Moon". If we build a graph directly from this, we get a fractured mess where the same concept is split across disconnected nodes.

Traditional approaches use string similarity (edit distance, Jaccard on tokens) plus blocking rules. That works for typos but fails on "Edwin Aldrin" vs "Buzz Aldrin" — two names with zero character overlap that refer to the same person.

We instead ask Claude to cluster entities of each type, using the one-line descriptions from extraction as disambiguation context. The descriptions matter: "Armstrong — first person to walk on the Moon" and "Armstrong — jazz trumpeter" have the same name but should not merge.

def resolve_entities(extractions: dict, model: str = SONNET) -> dict:
    # Collect all unique entity names with descriptions
    entity_map = {}
    for doc_title, extraction in extractions.items():
        for entity in extraction.entities:
            key = (entity.name, entity.type)
            if key not in entity_map:
                entity_map[key] = entity.description
    
    # Group by type for resolution
    by_type = {}
    for (name, etype), desc in entity_map.items():
        by_type.setdefault(etype, []).append((name, desc))
    
    alias_to_canonical = {}
    
    for etype, entities in by_type.items():
        prompt = f"""Group these {etype} entities that refer to the same real-world entity.
For each group, choose a canonical name. Return as a JSON mapping from alias to canonical.
Entities:
{chr(10).join(f'- {name}: {desc}' for name, desc in entities)}"""
        
        response = client.messages.parse(
            model=model,
            max_tokens=4096,
            messages=[{"role": "user", "content": prompt}],
            response_model=dict,
        )
        alias_to_canonical.update(response)
    
    return alias_to_canonical
alias_map = resolve_entities(all_extractions)

Two failure modes to watch for. First, any raw name Claude leaves out of every cluster silently disappears from the graph, because alias_to_canonical has no entry for it — a production resolver should fall back to a single-element cluster for unmatched names so nothing is lost. Second, the resolver can over-merge: a specific mission like "Gemini 12" may get folded into the broader "Project Gemini" because the descriptions overlap. The first loses nodes, the second loses precision. Both are worth spot-checking in the output below.

Assembling the Graph

With a clean alias map, we rewrite every relation endpoint to its canonical form and load the result into NetworkX. We use a MultiDiGraph because two entities can be connected by several distinct predicates ("launched from" and "operated by"), and direction matters ("Armstrong commanded Apollo 11" is not the same edge as "Apollo 11 commanded Armstrong").

Each node carries its type, the source document it came from, and the original description.

def build_graph(extractions: dict, alias_map: dict) -> nx.MultiDiGraph:
    G = nx.MultiDiGraph()
    
    for doc_title, extraction in extractions.items():
        for entity in extraction.entities:
            canonical = alias_map.get(entity.name, entity.name)
            G.add_node(
                canonical,
                type=entity.type,
                source=doc_title,
                description=entity.description
            )
        
        for relation in extraction.relations:
            subj = alias_map.get(relation.subject, relation.subject)
            obj = alias_map.get(relation.object, relation.object)
            G.add_edge(subj, obj, predicate=relation.predicate, source=doc_title)
    
    return G
graph = build_graph(all_extractions, alias_map)
print(f"Graph has {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")

Querying the Graph with Multi-Hop Reasoning

Now for the payoff: answering questions that require traversing multiple relations. We serialize the relevant subgraph back to Claude as context and let it reason over the connections.

def query_graph(question: str, G: nx.MultiDiGraph, model: str = SONNET) -> str:
    # Serialize the graph as a list of facts
    facts = []
    for u, v, data in G.edges(data=True):
        facts.append(f"{u} --[{data['predicate']}]--> {v}")
    
    graph_context = "\n".join(facts)
    
    response = client.messages.create(
        model=model,
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": f"""Answer the question using only the facts below.
Facts:
{graph_context}
Question: {question}
Answer concisely, citing the facts you used."""
        }]
    )
    return response.content[0].text
Example: multi-hop question
answer = query_graph(
    "Which astronauts were involved in missions that used the Saturn V rocket?",
    graph
)
print(answer)

Measuring Extraction Quality

To trust your graph in production, you need to measure precision and recall against a gold standard. Here's a simple evaluation framework:

def evaluate_extraction(gold_entities: set, gold_relations: set, 
                        extracted_entities: set, extracted_relations: set):
    """Compute precision, recall, and F1 for entity and relation extraction."""
    
    # Entity metrics
    true_pos_entities = gold_entities & extracted_entities
    precision_entities = len(true_pos_entities) / len(extracted_entities) if extracted_entities else 0
    recall_entities = len(true_pos_entities) / len(gold_entities) if gold_entities else 0
    f1_entities = 2  precision_entities  recall_entities / (precision_entities + recall_entities) if (precision_entities + recall_entities) else 0
    
    # Relation metrics
    true_pos_relations = gold_relations & extracted_relations
    precision_relations = len(true_pos_relations) / len(extracted_relations) if extracted_relations else 0
    recall_relations = len(true_pos_relations) / len(gold_relations) if gold_relations else 0
    f1_relations = 2  precision_relations  recall_relations / (precision_relations + recall_relations) if (precision_relations + recall_relations) else 0
    
    return {
        "entity_precision": precision_entities,
        "entity_recall": recall_entities,
        "entity_f1": f1_entities,
        "relation_precision": precision_relations,
        "relation_recall": recall_relations,
        "relation_f1": f1_relations
    }

Cost/Quality Tradeoff: Haiku vs. Sonnet

In practice, you'll find:

Haiku is ideal for high-volume extraction where the schema is well-defined and you need speed. It's ~5x cheaper than Sonnet and handles structured output reliably.
Sonnet shines for entity resolution and complex reasoning where nuance matters. It's better at disambiguating entities with overlapping names and catching subtle relations.

A common pattern: use Haiku for the initial extraction pass across thousands of documents, then use Sonnet for entity resolution and final query answering.

Key Takeaways

Structured outputs eliminate parsing headaches. Claude's native support for Pydantic models means you get validated, typed data back from every extraction call — no regex or JSON parsing needed.
Claude replaces multiple ML pipelines. Entity extraction, relation classification, and entity resolution all become single prompts, eliminating the need for domain-specific training data.
Entity resolution is the critical bottleneck. Without it, your graph is a fractured mess. Claude's ability to use semantic context (descriptions) makes it far more robust than string-similarity approaches.
Multi-hop reasoning becomes graph traversal. By serializing your knowledge graph back to Claude as context, you can answer questions that span multiple documents and relations.
Choose your model based on the task. Use Haiku for high-volume, schema-constrained extraction; use Sonnet for nuanced entity resolution and complex query answering.