BeClaude
Guide2026-05-05

Building Knowledge Graphs with Claude: From Unstructured Text to Structured Insights

Learn how to build knowledge graphs from unstructured documents using Claude AI. Extract entities, resolve duplicates, and run multi-hop queries without training data.

Quick Answer

This guide shows you how to use Claude to extract entities and relationships from unstructured text, resolve duplicate mentions, and build queryable knowledge graphs — all without training data or complex NLP pipelines.

knowledge graphentity extractionClaude APIstructured outputsentity resolution

Building Knowledge Graphs with Claude: From Unstructured Text to Structured Insights

You have a pile of unstructured documents and need to answer questions that span them — "who works with people who worked on project X", "which vendors are connected to this incident". No single document contains the answer. RAG retrieval won't chain the facts for you. You need a knowledge graph: entities as nodes, typed relations as edges, so that multi-hop reasoning becomes graph traversal.

Building one used to mean training a named-entity recognizer on your domain, training a relation classifier, writing entity-resolution heuristics, and maintaining all three as your data shifted. With Claude, each of those stages becomes a prompt.

What You'll Learn

By the end of this guide you will be able to:

  • Use structured outputs to extract typed entities and subject–predicate–object triples from arbitrary text with no training data
  • Apply Claude-driven entity resolution to collapse surface-form variants into canonical nodes, replacing brittle string-similarity heuristics
  • Assemble and query an in-memory graph, and run multi-hop questions by serializing subgraphs back to Claude
  • Measure extraction quality with precision/recall against a gold set and reason about the cost/quality tradeoff between Haiku and Sonnet
Everything runs in memory with no database. The techniques transfer directly to Neo4j, Neptune, or a Postgres adjacency table when you need to scale.

Prerequisites

  • Python 3.11+
  • Anthropic API key (get one here)
  • Basic familiarity with graphs (nodes, edges, traversal)

Setup

We use two models. Haiku handles the high-volume, schema-constrained extraction work where speed and cost matter more than nuance. Sonnet handles entity resolution and summarization, where the model needs to weigh conflicting evidence across documents.

import os
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

Define model choices

EXTRACTION_MODEL = "claude-3-haiku-20240307" # Fast, cheap extraction RESOLUTION_MODEL = "claude-3-sonnet-20240229" # Nuanced entity resolution

Building a Corpus

We need a handful of documents that talk about overlapping entities, so that entity resolution has real work to do. The Apollo program is a good test bed: six short Wikipedia summaries that all mention NASA, the Moon, several astronauts, and a launch vehicle — but each article names them slightly differently.

import requests

Fetch Wikipedia summaries for Apollo-related articles

articles = [ "Apollo 11", "Neil Armstrong", "Buzz Aldrin", "Saturn V", "NASA", "Moon landing" ]

corpus = {} for title in articles: url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{title}" response = requests.get(url).json() corpus[title] = response["extract"]

For a production pipeline you would chunk full documents; the extraction logic is identical.

Entity and Relation Extraction

Classical NER tags spans of text with labels (PERSON, ORG, LOC). Classical relation extraction then classifies pairs of spans into relation types. Both traditionally require labeled training data per domain.

We collapse both stages into a single Claude call per document. The key is structured outputs: we define the output shape as a Pydantic model and pass it to client.messages.parse(). Claude's response is guaranteed to validate against that schema and comes back as a typed Python object — no regex parsing, no JSON decode errors, no defensive isinstance checks.

from pydantic import BaseModel
from typing import List

class Entity(BaseModel): name: str type: str # PERSON, ORGANIZATION, LOCATION, EVENT, etc. description: str

class Relation(BaseModel): subject: str predicate: str object: str

class Extraction(BaseModel): entities: List[Entity] relations: List[Relation]

def extract_from_document(text: str) -> Extraction: response = client.messages.parse( model=EXTRACTION_MODEL, max_tokens=2000, system="Extract entities and their relationships from the text. " "Be thorough but precise.", messages=[{"role": "user", "content": text}], response_model=Extraction ) return response

Process all documents

all_extractions = {} for title, text in corpus.items(): all_extractions[title] = extract_from_document(text)

Notice how the same real-world entity appears under different surface forms across documents — this is the entity resolution problem we solve next.

Entity Resolution

The raw extraction gives us overlapping mentions: "NASA" and "National Aeronautics and Space Administration", "Neil Armstrong" and "Armstrong", possibly "the Moon" and "Moon". If we build a graph directly from this, we get a fractured mess where the same concept is split across disconnected nodes.

Traditional approaches use string similarity (edit distance, Jaccard on tokens) plus blocking rules. That works for typos but fails on "Edwin Aldrin" vs "Buzz Aldrin" — two names with zero character overlap that refer to the same person.

We instead ask Claude to cluster entities of each type, using the one-line descriptions from extraction as disambiguation context.

from typing import Dict, List

class EntityCluster(BaseModel): canonical_name: str aliases: List[str] type: str description: str

class ResolutionResult(BaseModel): clusters: List[EntityCluster]

def resolve_entities(all_extractions: Dict[str, Extraction]) -> Dict[str, str]: # Collect all unique entities with their descriptions entity_pool = {} for doc_title, extraction in all_extractions.items(): for entity in extraction.entities: key = f"{entity.name}|{entity.type}" if key not in entity_pool: entity_pool[key] = entity # Ask Claude to cluster them entities_text = "\n".join([ f"- {e.name} ({e.type}): {e.description}" for e in entity_pool.values() ]) response = client.messages.parse( model=RESOLUTION_MODEL, max_tokens=2000, system="Group these entities into clusters where they refer to the same real-world thing. " "Use descriptions for disambiguation.", messages=[{"role": "user", "content": entities_text}], response_model=ResolutionResult ) # Build alias to canonical mapping alias_map = {} for cluster in response.clusters: for alias in cluster.aliases: alias_map[alias] = cluster.canonical_name return alias_map

The descriptions matter: "Armstrong — first person to walk on the Moon" and "Armstrong — jazz trumpeter" have the same name but should not merge.

Watch Out For

Two failure modes to watch for. First, any raw name Claude leaves out of every cluster silently disappears from the graph — a production resolver should fall back to a single-element cluster for unmatched names so nothing is lost. Second, the resolver can over-merge: a specific mission like "Gemini 12" may get folded into the broader "Project Gemini" because the descriptions overlap. The first loses nodes, the second loses precision. Both are worth spot-checking in the output below.

Assembling the Graph

With a clean alias map, we rewrite every relation endpoint to its canonical form and load the result into NetworkX. We use a MultiDiGraph because two entities can be connected by several distinct predicates ("launched from" and "operated by"), and direction matters ("Armstrong commanded Apollo 11" is not the same edge as "Apollo 11 commanded Armstrong").

import networkx as nx

Build the graph

graph = nx.MultiDiGraph()

for doc_title, extraction in all_extractions.items(): for entity in extraction.entities: canonical = alias_map.get(entity.name, entity.name) graph.add_node(canonical, type=entity.type, description=entity.description) for relation in extraction.relations: subject = alias_map.get(relation.subject, relation.subject) obj = alias_map.get(relation.object, relation.object) graph.add_edge(subject, obj, predicate=relation.predicate, source=doc_title)

Each node carries its type, the source document, and a description. Each edge carries the predicate and provenance.

Querying the Graph

Now for the payoff: multi-hop questions. We traverse the graph to find relevant subgraphs, then serialize them back to Claude for reasoning.

def query_graph(question: str, graph: nx.MultiDiGraph, max_hops: int = 2) -> str:
    # Extract key entities from the question
    response = client.messages.parse(
        model=RESOLUTION_MODEL,
        max_tokens=500,
        system="Extract the key entities mentioned in this question.",
        messages=[{"role": "user", "content": question}],
        response_model=Entity
    )
    
    # Find these entities in our graph
    start_nodes = [n for n in graph.nodes() if response.name.lower() in n.lower()]
    
    # BFS to find relevant subgraph
    subgraph_nodes = set(start_nodes)
    frontier = set(start_nodes)
    for _ in range(max_hops):
        new_frontier = set()
        for node in frontier:
            neighbors = set(graph.successors(node)) | set(graph.predecessors(node))
            new_frontier.update(neighbors)
        subgraph_nodes.update(new_frontier)
        frontier = new_frontier
    
    subgraph = graph.subgraph(subgraph_nodes)
    
    # Serialize subgraph for Claude
    serialized = "Relevant knowledge graph:\n"
    for u, v, data in subgraph.edges(data=True):
        serialized += f"{u} --[{data['predicate']}]--> {v}\n"
    
    # Answer the question
    response = client.messages.create(
        model=RESOLUTION_MODEL,
        max_tokens=1000,
        system="Answer the question based only on the knowledge graph provided.",
        messages=[
            {"role": "user", "content": f"{serialized}\n\nQuestion: {question}"}
        ]
    )
    return response.content[0].text

Measuring Quality

To trust your graph, you need to measure precision and recall against a gold standard. Create a small annotated set of documents with known entities and relations, then compare your extraction against it.

def evaluate_extraction(gold_entities: List[str], extracted_entities: List[str]) -> Dict:
    gold_set = set(gold_entities)
    extracted_set = set(extracted_entities)
    
    true_positives = gold_set & extracted_set
    false_positives = extracted_set - gold_set
    false_negatives = gold_set - extracted_set
    
    precision = len(true_positives) / len(extracted_set) if extracted_set else 0
    recall = len(true_positives) / len(gold_set) if gold_set else 0
    f1 = 2  (precision  recall) / (precision + recall) if (precision + recall) else 0
    
    return {
        "precision": precision,
        "recall": recall,
        "f1": f1,
        "false_positives": list(false_positives),
        "false_negatives": list(false_negatives)
    }

Cost/Quality Tradeoffs

  • Haiku: Best for high-volume extraction where schema is well-defined. ~$0.25 per 1M tokens. Slightly lower accuracy on ambiguous entities.
  • Sonnet: Better for entity resolution and complex reasoning. ~$3 per 1M tokens. Handles nuanced disambiguation well.
  • Opus: Use for final quality checks or when errors are extremely costly.

Production Considerations

  • Persistence: For graphs larger than memory, export to Neo4j or Neptune
  • Incremental updates: Process new documents and run resolution only on new entities
  • Caching: Cache extraction results to avoid re-processing unchanged documents
  • Fallback resolution: Always keep unmatched entities as singleton clusters
  • Monitoring: Track precision/recall over time as your corpus evolves

Key Takeaways

  • No training data needed: Claude extracts entities and relations directly from text using structured outputs, eliminating the need for domain-specific NER or relation classifiers
  • LLM-powered entity resolution beats string matching: Claude can resolve "Buzz Aldrin" and "Edwin Aldrin" as the same entity, something edit distance would miss entirely
  • Multi-hop reasoning becomes graph traversal: By building a knowledge graph, you can answer questions that span multiple documents without complex retrieval pipelines
  • Choose your model based on the task: Use Haiku for high-volume extraction and Sonnet for nuanced resolution to balance cost and quality
  • Always measure and validate: Track precision/recall against a gold standard to catch over-merging and missed entities before they affect downstream queries