Guide2026-05-01

Building Knowledge Graphs from Unstructured Text with Claude

Learn how to use Claude to extract entities and relations from unstructured documents, resolve duplicates, and build queryable knowledge graphs — no training data required.

Quick Answer

This guide shows you how to turn piles of unstructured documents into a queryable knowledge graph using Claude. You'll extract entities and relations with structured outputs, resolve duplicates with Claude-driven entity resolution, and run multi-hop queries — all without training data or a database.

knowledge graphentity extractionstructured outputsClaude APIentity resolution

Building Knowledge Graphs from Unstructured Text with Claude

You have a stack of documents — incident reports, vendor contracts, project wikis — and you need to answer questions that span them. "Who worked with people who worked on Project X?" "Which vendors are connected to this incident?" No single document holds the answer. Traditional RAG retrieval won't chain the facts for you.

What you need is a knowledge graph: entities as nodes, typed relations as edges, so multi-hop reasoning becomes graph traversal.

Building one used to mean training a named-entity recognizer on your domain, training a relation classifier, writing entity-resolution heuristics, and maintaining all three as your data shifted. With Claude, each of those stages becomes a prompt.

What You'll Learn

By the end of this guide you will be able to:

Use structured outputs to extract typed entities and subject–predicate–object triples from arbitrary text with no training data
Apply Claude-driven entity resolution to collapse surface-form variants into canonical nodes, replacing brittle string-similarity heuristics
Assemble and query an in-memory graph, and run multi-hop questions by serializing subgraphs back to Claude
Measure extraction quality with precision/recall against a gold set and reason about the cost/quality tradeoff between Haiku and Sonnet

Everything runs in memory with no database. The techniques transfer directly to Neo4j, Neptune, or a Postgres adjacency table when you need to scale.

Prerequisites

Python 3.11+
Anthropic API key (get one here)
Basic familiarity with graphs (nodes, edges, traversal)

Setup

We use two models. Haiku handles the high-volume, schema-constrained extraction work where speed and cost matter more than nuance. Sonnet handles entity resolution and summarization, where the model needs to weigh conflicting evidence across documents.

import anthropic
from pydantic import BaseModel, Field
from typing import List, Optional
haiku = anthropic.Anthropic()
sonnet = anthropic.Anthropic()

Step 1: Building a Corpus

We need a handful of documents that talk about overlapping entities, so that entity resolution has real work to do. The Apollo program is a good test bed: six short Wikipedia summaries that all mention NASA, the Moon, several astronauts, and a launch vehicle — but each article names them slightly differently.

import requests
def fetch_wikipedia_summary(title):
    url = "https://en.wikipedia.org/api/rest_v1/page/summary/" + title
    response = requests.get(url)
    response.raise_for_status()
    return response.json()["extract"]
documents = {
    "Apollo 11": fetch_wikipedia_summary("Apollo 11"),
    "Neil Armstrong": fetch_wikipedia_summary("Neil Armstrong"),
    "Buzz Aldrin": fetch_wikipedia_summary("Buzz Aldrin"),
    "Saturn V": fetch_wikipedia_summary("Saturn V"),
    "NASA": fetch_wikipedia_summary("NASA"),
    "Moon": fetch_wikipedia_summary("Moon"),
}

For a production pipeline you would chunk full documents; the extraction logic is identical.

Step 2: Entity and Relation Extraction

Classical NER tags spans of text with labels (PERSON, ORG, LOC). Classical relation extraction then classifies pairs of spans into relation types. Both traditionally require labeled training data per domain.

We collapse both stages into a single Claude call per document. The key is structured outputs: we define the output shape as a Pydantic model and pass it to client.messages.parse(). Claude's response is guaranteed to validate against that schema and comes back as a typed Python object — no regex parsing, no JSON decode errors, no defensive isinstance checks.

class Entity(BaseModel):
    name: str = Field(description="The surface form of the entity as it appears in text")
    type: str = Field(description="Entity type: PERSON, ORG, LOC, MISSION, VEHICLE, etc.")
    description: str = Field(description="One-line context for disambiguation")
class Relation(BaseModel):
    subject: str = Field(description="Subject entity name (must match an extracted entity)")
    predicate: str = Field(description="Relation type in present tense, e.g. 'commanded', 'launched_from'")
    object: str = Field(description="Object entity name (must match an extracted entity)")
class Extraction(BaseModel):
    entities: List[Entity]
    relations: List[Relation]
def extract_from_document(text: str) -> Extraction:
    response = haiku.messages.parse(
        model="claude-3-haiku-20240307",
        max_tokens=2000,
        system="Extract all entities and their relationships from the text. Use the provided schema.",
        messages=[{"role": "user", "content": text}],
        response_model=Extraction,
    )
    return response

Let's look at what was extracted. Notice how the same real-world entity appears under different surface forms across documents — this is the entity resolution problem we solve next.

Step 3: Entity Resolution

The raw extraction gives us overlapping mentions: "NASA" and "National Aeronautics and Space Administration", "Neil Armstrong" and "Armstrong", possibly "the Moon" and "Moon". If we build a graph directly from this, we get a fractured mess where the same concept is split across disconnected nodes.

Traditional approaches use string similarity (edit distance, Jaccard on tokens) plus blocking rules. That works for typos but fails on "Edwin Aldrin" vs "Buzz Aldrin" — two names with zero character overlap that refer to the same person.

We instead ask Claude to cluster entities of each type, using the one-line descriptions from extraction as disambiguation context.

class EntityCluster(BaseModel):
    canonical_name: str = Field(description="The canonical name for this entity")
    aliases: List[str] = Field(description="All surface forms that refer to this entity")
def resolve_entities(entities: List[Entity]) -> dict:
    """Returns a mapping from alias -> canonical name."""
    # Group entities by type
    by_type = {}
    for e in entities:
        by_type.setdefault(e.type, []).append(e)
    
    alias_map = {}
    for etype, group in by_type.items():
        # Build a prompt with all entities of this type and their descriptions
        prompt = f"""Group these {etype} entities that refer to the same real-world thing.
For each group, choose the most canonical name.
Entities:
"""
        for i, e in enumerate(group):
            prompt += f"{i}. Name: {e.name} — {e.description}\n"
        
        response = sonnet.messages.parse(
            model="claude-3-sonnet-20240229",
            max_tokens=1000,
            system="You are an entity resolution system. Group aliases that refer to the same entity.",
            messages=[{"role": "user", "content": prompt}],
            response_model=List[EntityCluster],
        )
        
        for cluster in response:
            for alias in cluster.aliases:
                alias_map[alias] = cluster.canonical_name
    
    return alias_map

The descriptions matter: "Armstrong — first person to walk on the Moon" and "Armstrong — jazz trumpeter" have the same name but should not merge.

Two failure modes to watch for:

Silent drops — Any raw name Claude leaves out of every cluster disappears from the graph. A production resolver should fall back to a single-element cluster for unmatched names so nothing is lost.

Over-merging — A specific mission like "Gemini 12" may get folded into the broader "Project Gemini" because the descriptions overlap. The first loses nodes, the second loses precision. Both are worth spot-checking.

Step 4: Assembling the Graph

With a clean alias map, we rewrite every relation endpoint to its canonical form and load the result into NetworkX.

import networkx as nx
def build_graph(extractions: List[Extraction], alias_map: dict) -> nx.MultiDiGraph:
    G = nx.MultiDiGraph()
    
    for extraction in extractions:
        for entity in extraction.entities:
            canonical = alias_map.get(entity.name, entity.name)
            G.add_node(canonical, type=entity.type, description=entity.description)
        
        for relation in extraction.relations:
            subj = alias_map.get(relation.subject, relation.subject)
            obj = alias_map.get(relation.object, relation.object)
            G.add_edge(subj, obj, predicate=relation.predicate)
    
    return G

We use a MultiDiGraph because two entities can be connected by several distinct predicates ("launched from" and "operated by"), and direction matters ("Armstrong commanded Apollo 11" is not the same edge as "Apollo 11 commanded Armstrong").

Each node carries its type and description as attributes, so we can filter or serialize them later.

Step 5: Querying the Graph with Multi-Hop Reasoning

Now for the payoff: answering questions that span multiple documents. We serialize a subgraph around the entities mentioned in the question and feed it back to Claude.

def query_graph(G: nx.MultiDiGraph, question: str) -> str:
    # First, identify relevant entities from the question
    extraction = extract_from_document(question)
    relevant_names = [e.name for e in extraction.entities]
    
    # Build a subgraph: include neighbors up to 2 hops
    nodes_to_include = set(relevant_names)
    for name in relevant_names:
        if name in G:
            # 1-hop neighbors
            nodes_to_include.update(G.neighbors(name))
            nodes_to_include.update(G.predecessors(name))
            # 2-hop neighbors
            for neighbor in list(G.neighbors(name)) + list(G.predecessors(name)):
                nodes_to_include.update(G.neighbors(neighbor))
                nodes_to_include.update(G.predecessors(neighbor))
    
    subgraph = G.subgraph(nodes_to_include)
    
    # Serialize the subgraph as text
    graph_text = "Knowledge graph:\n"
    for u, v, data in subgraph.edges(data=True):
        graph_text += f"{u} --[{data['predicate']}]--> {v}\n"
    
    prompt = f"""Given this knowledge graph, answer the question.
{graph_text}
Question: {question}
Answer:"""
    
    response = sonnet.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}],
    )
    return response.content[0].text

This approach lets Claude reason over the graph structure without needing to embed the entire document corpus. You get answers that chain facts across documents — something pure RAG struggles with.

Measuring Quality

To trust your knowledge graph, you need to measure extraction quality. Build a small gold set of documents with manually annotated entities and relations, then compare Claude's output:

def evaluate_extraction(gold: Extraction, predicted: Extraction):
    gold_entities = set((e.name, e.type) for e in gold.entities)
    pred_entities = set((e.name, e.type) for e in predicted.entities)
    
    true_positives = gold_entities & pred_entities
    precision = len(true_positives) / len(pred_entities) if pred_entities else 0
    recall = len(true_positives) / len(gold_entities) if gold_entities else 0
    f1 = 2  precision  recall / (precision + recall) if (precision + recall) else 0
    
    return {"precision": precision, "recall": recall, "f1": f1}

Cost/quality tradeoff: Haiku is ~5x cheaper than Sonnet but may miss subtle relations. Use Haiku for high-volume extraction where schema is simple; use Sonnet for entity resolution and complex queries where accuracy matters more.

Production Considerations

Scaling storage: Replace NetworkX with Neo4j or Neptune when your graph exceeds memory
Incremental updates: Re-run extraction only on new or changed documents; re-resolve entities periodically
Confidence scores: Ask Claude to include a confidence score (1-10) for each relation, then filter low-confidence edges
Human review: Build a dashboard to spot-check extractions and entity clusters

Key Takeaways

No training data needed — Claude extracts entities and relations directly from unstructured text using structured outputs, replacing traditional NER and relation classification pipelines
Entity resolution is the bottleneck — Claude-driven clustering handles semantic aliases ("Buzz Aldrin" vs "Edwin Aldrin") that string similarity misses, but watch for over-merging and silent drops
Multi-hop reasoning works — By serializing subgraphs back to Claude, you can answer questions that chain facts across documents, something RAG alone cannot do
Choose your model wisely — Haiku for high-volume extraction where cost matters; Sonnet for entity resolution and complex queries where accuracy is critical
Measure before you trust — Build a small gold set and track precision/recall to catch regressions as your data evolves