Guide2026-04-24

Building Knowledge Graphs from Unstructured Text with Claude: A Practical Guide

Learn how to use Claude to extract entities and relations from unstructured documents, resolve aliases, and build queryable knowledge graphs without training data.

Quick Answer

This guide shows how to use Claude's structured outputs to extract entities and relations from text, resolve duplicate mentions into canonical nodes, and build an in-memory knowledge graph for multi-hop question answering—all without labeled training data.

Knowledge GraphsEntity ExtractionStructured OutputsClaude APIData Engineering

Building Knowledge Graphs from Unstructured Text with Claude: A Practical Guide

You have a pile of unstructured documents and need to answer questions that span them — "who works with people who worked on project X", "which vendors are connected to this incident". No single document contains the answer. Traditional RAG retrieval won't chain the facts for you. You need a knowledge graph: entities as nodes, typed relations as edges, so that multi-hop reasoning becomes graph traversal.

Building one used to mean training a named-entity recognizer on your domain, training a relation classifier, writing entity-resolution heuristics, and maintaining all three as your data shifted. With Claude, each of those stages becomes a prompt.

What You'll Learn

By the end of this guide you will be able to:

Use structured outputs to extract typed entities and subject–predicate–object triples from arbitrary text with no training data
Apply Claude-driven entity resolution to collapse surface-form variants into canonical nodes, replacing brittle string-similarity heuristics
Assemble and query an in-memory graph, and run multi-hop questions by serializing subgraphs back to Claude
Measure extraction quality with precision/recall against a gold set and reason about the cost/quality tradeoff between Haiku and Sonnet

Everything runs in memory with no database. The techniques transfer directly to Neo4j, Neptune, or a Postgres adjacency table when you need to scale.

Prerequisites

Python 3.11+
Anthropic API key (get one here)
Basic familiarity with graphs (nodes, edges, traversal)

Setup

We use two models. Haiku handles the high-volume, schema-constrained extraction work where speed and cost matter more than nuance. Sonnet handles entity resolution and summarization, where the model needs to weigh conflicting evidence across documents.

import anthropic
from pydantic import BaseModel, Field
from typing import List, Optional
Initialize clients
haiku_client = anthropic.Anthropic()  # Defaults to Haiku for speed
sonnet_client = anthropic.Anthropic()  # We'll specify model per call

Building a Corpus

We need a handful of documents that talk about overlapping entities, so that entity resolution has real work to do. The Apollo program is a good test bed: six short Wikipedia summaries that all mention NASA, the Moon, several astronauts, and a launch vehicle — but each article names them slightly differently.

import requests
Fetch summaries from Wikipedia REST API
topics = ["Apollo 11", "Neil Armstrong", "Buzz Aldrin", "Saturn V", "NASA", "Moon landing"]
documents = {}
for topic in topics:
    url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{topic}"
    response = requests.get(url)
    documents[topic] = response.json()["extract"]

For a production pipeline you would chunk full documents; the extraction logic is identical.

Entity and Relation Extraction

Classical NER tags spans of text with labels (PERSON, ORG, LOC). Classical relation extraction then classifies pairs of spans into relation types. Both traditionally require labeled training data per domain.

We collapse both stages into a single Claude call per document. The key is structured outputs: we define the output shape as a Pydantic model and pass it to client.messages.parse(). Claude's response is guaranteed to validate against that schema and comes back as a typed Python object — no regex parsing, no JSON decode errors, no defensive isinstance checks.

from typing import List
from pydantic import BaseModel
class Entity(BaseModel):
    name: str = Field(description="The entity name as it appears in text")
    type: str = Field(description="Entity type: PERSON, ORG, LOC, EVENT, or CONCEPT")
    description: str = Field(description="One-line description for disambiguation")
class Relation(BaseModel):
    subject: str = Field(description="Subject entity name")
    predicate: str = Field(description="Relation type in present tense, e.g., 'works_for', 'launched'")
    object: str = Field(description="Object entity name")
class Extraction(BaseModel):
    entities: List[Entity]
    relations: List[Relation]
def extract_from_text(text: str, client) -> Extraction:
    response = client.messages.parse(
        model="claude-3-haiku-20240307",
        max_tokens=1000,
        system="Extract all named entities and their relationships from the text. Be thorough.",
        messages=[{"role": "user", "content": text}],
        response_model=Extraction
    )
    return response

Let's look at what was extracted. Notice how the same real-world entity appears under different surface forms across documents — this is the entity resolution problem we solve next.

Entity Resolution

The raw extraction gives us overlapping mentions: "NASA" and "National Aeronautics and Space Administration", "Neil Armstrong" and "Armstrong", possibly "the Moon" and "Moon". If we build a graph directly from this, we get a fractured mess where the same concept is split across disconnected nodes.

Traditional approaches use string similarity (edit distance, Jaccard on tokens) plus blocking rules. That works for typos but fails on "Edwin Aldrin" vs "Buzz Aldrin" — two names with zero character overlap that refer to the same person.

We instead ask Claude to cluster entities of each type, using the one-line descriptions from extraction as disambiguation context. The descriptions matter: "Armstrong — first person to walk on the Moon" and "Armstrong — jazz trumpeter" have the same name but should not merge.

def resolve_entities(entities: List[Entity], client) -> dict:
    """Returns a mapping from alias -> canonical name"""
    # Group entities by type for focused resolution
    by_type = {}
    for entity in entities:
        by_type.setdefault(entity.type, []).append(entity)
    
    alias_to_canonical = {}
    for etype, group in by_type.items():
        # Build a prompt with all entities of this type
        entity_list = "\n".join([f"- {e.name}: {e.description}" for e in group])
        prompt = f"""Group these {etype} entities that refer to the same real-world thing.
For each group, choose the most canonical name.
Entities:
{entity_list}
Return a JSON mapping from each original name to its canonical name."""
        
        response = client.messages.parse(
            model="claude-3-sonnet-20240229",
            max_tokens=1000,
            messages=[{"role": "user", "content": prompt}],
            response_model=dict
        )
        alias_to_canonical.update(response)
    
    return alias_to_canonical

Two failure modes to watch for. First, any raw name Claude leaves out of every cluster silently disappears from the graph — a production resolver should fall back to a single-element cluster for unmatched names so nothing is lost. Second, the resolver can over-merge: a specific mission like "Gemini 12" may get folded into the broader "Project Gemini" because the descriptions overlap. The first loses nodes, the second loses precision. Both are worth spot-checking in the output below.

Assembling the Graph

With a clean alias map, we rewrite every relation endpoint to its canonical form and load the result into NetworkX. We use a MultiDiGraph because two entities can be connected by several distinct predicates ("launched from" and "operated by"), and direction matters ("Armstrong commanded Apollo 11" is not the same edge as "Apollo 11 commanded Armstrong").

import networkx as nx
def build_graph(extractions: List[Extraction], alias_map: dict) -> nx.MultiDiGraph:
    G = nx.MultiDiGraph()
    
    for extraction in extractions:
        for entity in extraction.entities:
            canonical = alias_map.get(entity.name, entity.name)
            G.add_node(canonical, type=entity.type, description=entity.description)
        
        for relation in extraction.relations:
            subj = alias_map.get(relation.subject, relation.subject)
            obj = alias_map.get(relation.object, relation.object)
            G.add_edge(subj, obj, predicate=relation.predicate)
    
    return G

Each node carries its type, the description, and any other metadata you want to preserve. Edges carry the predicate as an attribute, so you can filter by relation type during traversal.

Querying the Graph with Multi-Hop Questions

Now the fun part: asking questions that require traversing multiple edges. We serialize a subgraph around the entities mentioned in the question and feed it back to Claude for reasoning.

def query_graph(question: str, G: nx.MultiDiGraph, client) -> str:
    # Find relevant nodes by simple keyword matching
    keywords = set(question.lower().split())
    relevant_nodes = [n for n in G.nodes() if any(kw in n.lower() for kw in keywords)]
    
    # Extract subgraph around these nodes (2-hop neighborhood)
    subgraph_nodes = set(relevant_nodes)
    for node in relevant_nodes:
        subgraph_nodes.update(G.predecessors(node))
        subgraph_nodes.update(G.successors(node))
    
    subgraph = G.subgraph(subgraph_nodes)
    
    # Serialize the subgraph as text
    graph_text = "Knowledge graph:\n"
    for u, v, data in subgraph.edges(data=True):
        graph_text += f"- {u} --[{data['predicate']}]--> {v}\n"
    
    prompt = f"""{graph_text}
Question: {question}
Answer based only on the knowledge graph above."""
    
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

This approach lets you answer questions like "What did the commander of Apollo 11 do after the mission?" by traversing from "Apollo 11" to its commander, then to subsequent events.

Measuring Quality

To trust your pipeline, you need to measure precision and recall against a gold set. Create a small annotated dataset of documents with expected entities and relations, then compare:

def evaluate_extraction(gold: Extraction, predicted: Extraction) -> dict:
    gold_entities = set((e.name, e.type) for e in gold.entities)
    pred_entities = set((e.name, e.type) for e in predicted.entities)
    
    true_positives = gold_entities & pred_entities
    precision = len(true_positives) / len(pred_entities) if pred_entities else 0
    recall = len(true_positives) / len(gold_entities) if gold_entities else 0
    
    return {
        "precision": precision,
        "recall": recall,
        "f1": 2  precision  recall / (precision + recall) if (precision + recall) else 0
    }

Expect Haiku to give ~85-90% of Sonnet's quality at ~1/10th the cost. For many applications, that tradeoff is worth it.

Key Takeaways

No training data needed: Claude's structured outputs let you extract entities and relations directly from text, replacing traditional NER and relation classification pipelines
LLM-based entity resolution beats heuristics: Claude can resolve aliases like "Edwin Aldrin" → "Buzz Aldrin" that string similarity would miss, using semantic context from descriptions
Multi-hop reasoning becomes graph traversal: By serializing subgraphs back to Claude, you can answer questions that span multiple documents without complex retrieval chains
Cost/quality tradeoffs matter: Use Haiku for high-volume extraction and Sonnet for resolution and reasoning tasks where nuance is critical
Start simple, scale later: The in-memory approach with NetworkX works for prototyping and small datasets; the same extraction patterns transfer to production graph databases