BeClaude
Guide2026-05-02

Building Knowledge Graphs from Unstructured Text with Claude

Learn how to use Claude to extract entities and relations from unstructured documents, resolve duplicates, and build queryable knowledge graphs — no training data required.

Quick Answer

This guide shows you how to use Claude's structured outputs to extract entities and typed relations from unstructured text, resolve duplicate mentions with AI-driven clustering, and build an in-memory knowledge graph for multi-hop question answering — all without training data or complex NLP pipelines.

knowledge graphentity extractionstructured outputsentity resolutionClaude API

Building Knowledge Graphs from Unstructured Text with Claude

You have a pile of unstructured documents and need to answer questions that span them — "who works with people who worked on project X", "which vendors are connected to this incident". No single document contains the answer. RAG retrieval won't chain the facts for you. You need a knowledge graph: entities as nodes, typed relations as edges, so that multi-hop reasoning becomes graph traversal.

Building one used to mean training a named-entity recognizer on your domain, training a relation classifier, writing entity-resolution heuristics, and maintaining all three as your data shifted. With Claude, each of those stages becomes a prompt.

What You'll Learn

By the end of this guide you will be able to:

  • Use structured outputs to extract typed entities and subject–predicate–object triples from arbitrary text with no training data
  • Apply Claude-driven entity resolution to collapse surface-form variants into canonical nodes, replacing brittle string-similarity heuristics
  • Assemble and query an in-memory graph, and run multi-hop questions by serializing subgraphs back to Claude
  • Measure extraction quality with precision/recall against a gold set and reason about the cost/quality tradeoff between Haiku and Sonnet
Everything runs in memory with no database. The techniques transfer directly to Neo4j, Neptune, or a Postgres adjacency table when you need to scale.

Prerequisites

  • Python 3.11+
  • Anthropic API key (get one here)
  • Basic familiarity with graphs (nodes, edges, traversal)
  • anthropic Python SDK installed (pip install anthropic)

Setup

We use two models. Haiku handles the high-volume, schema-constrained extraction work where speed and cost matter more than nuance. Sonnet handles entity resolution and summarization, where the model needs to weigh conflicting evidence across documents.

import anthropic

client = anthropic.Anthropic()

EXTRACTION_MODEL = "claude-3-haiku-20240307" RESOLUTION_MODEL = "claude-3-sonnet-20240229"

Step 1: Building a Corpus

We need a handful of documents that talk about overlapping entities, so that entity resolution has real work to do. The Apollo program is a good test bed: six short Wikipedia summaries that all mention NASA, the Moon, several astronauts, and a launch vehicle — but each article names them slightly differently.

import requests

Fetch short Wikipedia summaries

topics = ["Apollo 11", "Neil Armstrong", "Buzz Aldrin", "Saturn V", "NASA", "Moon landing"] corpus = {} for topic in topics: url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{topic}" response = requests.get(url) if response.status_code == 200: corpus[topic] = response.json()["extract"]

For a production pipeline you would chunk full documents; the extraction logic is identical.

Step 2: Entity and Relation Extraction

Classical NER tags spans of text with labels (PERSON, ORG, LOC). Classical relation extraction then classifies pairs of spans into relation types. Both traditionally require labeled training data per domain.

We collapse both stages into a single Claude call per document. The key is structured outputs: we define the output shape as a Pydantic model and pass it to client.messages.parse(). Claude's response is guaranteed to validate against that schema and comes back as a typed Python object — no regex parsing, no JSON decode errors, no defensive isinstance checks.

from pydantic import BaseModel
from typing import List

class Entity(BaseModel): name: str type: str # PERSON, ORG, LOC, MISSION, VEHICLE, etc. description: str # one-line context for disambiguation

class Relation(BaseModel): subject: str predicate: str object: str

class Extraction(BaseModel): entities: List[Entity] relations: List[Relation]

def extract_from_text(text: str) -> Extraction: response = client.messages.parse( model=EXTRACTION_MODEL, max_tokens=2000, system="Extract all named entities and their relationships from the text. " "Use entity types like PERSON, ORG, LOC, MISSION, VEHICLE, etc. " "For relations, use subject-predicate-object triples.", messages=[{"role": "user", "content": text}], response_model=Extraction ) return response

Extract from each document

extractions = {} for topic, text in corpus.items(): extractions[topic] = extract_from_text(text)

Let's look at what was extracted. Notice how the same real-world entity appears under different surface forms across documents — this is the entity resolution problem we solve next.

Step 3: Entity Resolution

The raw extraction gives us overlapping mentions: "NASA" and "National Aeronautics and Space Administration", "Neil Armstrong" and "Armstrong", possibly "the Moon" and "Moon". If we build a graph directly from this, we get a fractured mess where the same concept is split across disconnected nodes.

Traditional approaches use string similarity (edit distance, Jaccard on tokens) plus blocking rules. That works for typos but fails on "Edwin Aldrin" vs "Buzz Aldrin" — two names with zero character overlap that refer to the same person.

We instead ask Claude to cluster entities of each type, using the one-line descriptions from extraction as disambiguation context.

from typing import Dict, List

class EntityCluster(BaseModel): canonical_name: str aliases: List[str] type: str description: str

class ResolutionResult(BaseModel): clusters: List[EntityCluster]

def resolve_entities(all_entities: List[Entity]) -> Dict[str, str]: """Returns a mapping from alias -> canonical name.""" # Group entities by type for focused resolution by_type: Dict[str, List[Entity]] = {} for entity in all_entities: by_type.setdefault(entity.type, []).append(entity) alias_to_canonical = {} for etype, entities in by_type.items(): # Prepare a summary for Claude entity_list = "\n".join([f"- {e.name}: {e.description}" for e in entities]) prompt = f"""Cluster these {etype} entities that refer to the same real-world thing. Use the descriptions for disambiguation.

{entity_list}

For each cluster, provide a canonical name and list all aliases.""" response = client.messages.parse( model=RESOLUTION_MODEL, max_tokens=2000, messages=[{"role": "user", "content": prompt}], response_model=ResolutionResult ) for cluster in response.clusters: for alias in cluster.aliases: alias_to_canonical[alias] = cluster.canonical_name return alias_to_canonical

Collect all unique entities

all_entities = list(set(e for ext in extractions.values() for e in ext.entities)) alias_map = resolve_entities(all_entities)

Two failure modes to watch for. First, any raw name Claude leaves out of every cluster silently disappears from the graph — a production resolver should fall back to a single-element cluster for unmatched names so nothing is lost. Second, the resolver can over-merge: a specific mission like "Gemini 12" may get folded into the broader "Project Gemini" because the descriptions overlap. The first loses nodes, the second loses precision. Both are worth spot-checking in the output below.

Step 4: Assembling the Graph

With a clean alias map, we rewrite every relation endpoint to its canonical form and load the result into NetworkX. We use a MultiDiGraph because two entities can be connected by several distinct predicates ("launched from" and "operated by"), and direction matters ("Armstrong commanded Apollo 11" is not the same edge as "Apollo 11 commanded Armstrong").

import networkx as nx

G = nx.MultiDiGraph()

Add canonical entities as nodes

for ext in extractions.values(): for entity in ext.entities: canonical = alias_map.get(entity.name, entity.name) G.add_node(canonical, type=entity.type, description=entity.description)

Add edges with canonical names

for ext in extractions.values(): for rel in ext.relations: subj = alias_map.get(rel.subject, rel.subject) obj = alias_map.get(rel.object, rel.object) G.add_edge(subj, obj, predicate=rel.predicate)

Step 5: Querying the Graph

Now for the payoff: multi-hop questions. We serialize the relevant subgraph back to Claude for reasoning.

def query_graph(question: str, graph: nx.MultiDiGraph) -> str:
    # Serialize a relevant subgraph (simplified: use whole graph for small graphs)
    nodes = list(graph.nodes(data=True))
    edges = list(graph.edges(data=True))
    
    graph_text = "Nodes:\n"
    for node, data in nodes:
        graph_text += f"- {node} ({data.get('type', 'unknown')})\n"
    graph_text += "\nEdges:\n"
    for subj, obj, data in edges:
        graph_text += f"- {subj} --[{data['predicate']}]--> {obj}\n"
    
    prompt = f"""Given this knowledge graph, answer the question.

{graph_text}

Question: {question}

Answer concisely based only on the graph.""" response = client.messages.create( model=RESOLUTION_MODEL, max_tokens=500, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text

Example: multi-hop question

answer = query_graph("Which astronauts were involved in Apollo 11?", G) print(answer)

Measuring Quality

To trust your graph in production, measure precision and recall against a gold set of known facts. Create a small set of expected triples, then compare against what your pipeline extracted.

def evaluate_extraction(gold_relations: set, extracted_relations: set):
    true_positives = gold_relations & extracted_relations
    precision = len(true_positives) / len(extracted_relations) if extracted_relations else 0
    recall = len(true_positives) / len(gold_relations) if gold_relations else 0
    f1 = 2  precision  recall / (precision + recall) if (precision + recall) else 0
    return {"precision": precision, "recall": recall, "f1": f1}

Cost/Quality Tradeoff: Haiku vs Sonnet

  • Haiku is ideal for high-volume extraction where speed and cost matter. It handles well-defined schemas reliably.
  • Sonnet excels at entity resolution and complex reasoning where nuance matters. Use it for clustering and final query answering.
A typical pipeline uses Haiku for extraction (cheaper per document) and Sonnet for resolution (fewer calls, higher stakes).

Key Takeaways

  • No training data needed: Claude's structured outputs let you extract entities and relations with a single prompt, replacing traditional NER and relation classification pipelines.
  • AI-driven entity resolution beats heuristics: Claude can resolve "Buzz Aldrin" and "Edwin Aldrin" as the same person using context, where string similarity fails.
  • Graph + LLM = powerful querying: Serializing subgraphs back to Claude enables multi-hop reasoning across documents without complex traversal logic.
  • Model selection matters: Use Haiku for high-volume extraction and Sonnet for nuanced resolution and reasoning to optimize cost and quality.
  • Always validate: Implement precision/recall checks against a gold set to catch over-merging and missing entities before production deployment.