Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG Techniques
Learn how to summarize long documents with Claude AI, including prompt engineering, metadata extraction, handling token limits, and evaluating summary quality with ROUGE scores.
This guide teaches you how to use Claude for effective document summarization, covering basic prompts, multi-shot techniques, guided summarization for legal docs, meta-summarization for long texts, and RAG-based approaches. You'll also learn to evaluate and iteratively improve summary quality.
Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG Techniques
Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher synthesizing papers, or a product manager reviewing customer feedback, the ability to condense lengthy documents into actionable insights is invaluable.
Claude excels at summarization tasks thanks to its large context window and nuanced understanding of language. This guide walks you through a complete workflow—from basic summarization to advanced Retrieval-Augmented Generation (RAG) techniques—using real code examples and evaluation strategies.
Why Summarization Is Hard (and Why Claude Helps)
Evaluating summary quality is notoriously subjective. What one reader considers a perfect summary, another may find too vague or too detailed. Traditional metrics like ROUGE scores measure word overlap but miss coherence, factual accuracy, and relevance. Claude's ability to follow detailed instructions and maintain context over long documents makes it uniquely suited to this challenge.
Setting Up Your Environment
First, install the required packages:
pip install anthropic pypdf pandas matplotlib scikit-learn numpy rouge-score nltk seaborn promptfoo
You'll also need a Claude API key. Set it as an environment variable:
export ANTHROPIC_API_KEY="sk-ant-..."
Data Preparation: Extracting Text from PDFs
Before summarizing, you need clean text. Here's a Python function to extract text from PDFs using pypdf:
import pypdf
def extract_text_from_pdf(pdf_path: str) -> str:
"""Extract text from a PDF file."""
text = ""
with open(pdf_path, "rb") as file:
reader = pypdf.PdfReader(file)
for page in reader.pages:
text += page.extract_text()
return text
Example usage
document_text = extract_text_from_pdf("sublease_agreement.pdf")
For this guide, we'll use a publicly available Sublease Agreement from the SEC's EDGAR system. You can substitute any PDF or plain text.
Basic Summarization with Claude
Let's start with a simple summarization function. This uses the Messages API with a system prompt and user message:
import anthropic
client = anthropic.Anthropic()
def summarize_text(text: str, max_tokens: int = 500) -> str:
"""Basic summarization using Claude."""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=max_tokens,
system="You are an expert summarizer. Provide a concise summary of the key points.",
messages=[
{"role": "user", "content": f"Please summarize the following document:\n\n{text}"}
]
)
return response.content[0].text
summary = summarize_text(document_text)
print(summary)
This works, but it's basic. The summary will be generic and may miss important details specific to your use case.
Multi-Shot Basic Summarization
A simple improvement is to use a multi-shot approach—asking Claude to first extract key points, then synthesize them into a summary:
def multi_shot_summarize(text: str) -> str:
"""Two-step summarization: extract then synthesize."""
# Step 1: Extract key points
extraction = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1000,
messages=[
{"role": "user", "content": f"Extract the 10 most important points from this document:\n\n{text}"}
]
)
key_points = extraction.content[0].text
# Step 2: Synthesize into a summary
synthesis = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[
{"role": "user", "content": f"Based on these key points, write a coherent summary:\n\n{key_points}"}
]
)
return synthesis.content[0].text
This two-step process often produces more structured and comprehensive summaries.
Advanced Techniques
Guided Summarization
Instead of a generic prompt, guide Claude with specific instructions about what to include:
def guided_summarize(text: str, focus_areas: list[str]) -> str:
"""Summarize with specific focus areas."""
focus_str = "\n".join([f"- {area}" for area in focus_areas])
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=800,
system="You are a legal document analyst. Provide structured summaries.",
messages=[
{"role": "user", "content": f"""
Summarize this legal document with emphasis on:
{focus_str}
Document:
{text}
Provide your summary in the following format:
- Document Type and Parties
- Key Dates and Deadlines
- Financial Obligations
- Termination Conditions
- Risk Factors
"""}
]
)
return response.content[0].text
Example for a sublease agreement
summary = guided_summarize(
document_text,
focus_areas=["Rent and payment terms", "Maintenance responsibilities", "Early termination clauses"]
)
Domain-Specific Guided Summarization
For legal documents, you can extract specific metadata fields:
def extract_legal_metadata(text: str) -> dict:
"""Extract structured metadata from a legal document."""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[
{"role": "user", "content": f"""
Extract the following fields from this legal document. Return as JSON:
- document_type
- parties_involved (list of names)
- effective_date
- expiration_date
- governing_law_state
- total_pages
Document:
{text}
"""}
]
)
return response.content[0].text
Meta-Summarization: Handling Long Documents
When a document exceeds Claude's context window (or your token budget), use a chunk-and-summarize approach:
def chunk_text(text: str, chunk_size: int = 3000) -> list[str]:
"""Split text into chunks of approximately chunk_size words."""
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size):
chunk = " ".join(words[i:i + chunk_size])
chunks.append(chunk)
return chunks
def meta_summarize(text: str) -> str:
"""Summarize a long document by chunking, summarizing each chunk, then summarizing the summaries."""
chunks = chunk_text(text)
# Step 1: Summarize each chunk
chunk_summaries = []
for i, chunk in enumerate(chunks):
summary = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=300,
messages=[
{"role": "user", "content": f"Summarize chunk {i+1} of {len(chunks)}:\n\n{chunk}"}
]
)
chunk_summaries.append(summary.content[0].text)
# Step 2: Summarize the summaries
combined = "\n\n".join(chunk_summaries)
final_summary = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[
{"role": "user", "content": f"Synthesize these chunk summaries into a single coherent summary:\n\n{combined}"}
]
)
return final_summary.content[0].text
Summary Indexed Documents: An Advanced RAG Approach
For very large document collections, use a RAG (Retrieval-Augmented Generation) approach where you pre-summarize chunks and index them for retrieval:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
def build_summary_index(documents: list[str]) -> dict:
"""Build a searchable index of document summaries."""
# Summarize each document
summaries = []
for doc in documents:
summary = summarize_text(doc, max_tokens=200)
summaries.append(summary)
# Create TF-IDF vectors for summaries
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(summaries)
return {
"summaries": summaries,
"vectorizer": vectorizer,
"tfidf_matrix": tfidf_matrix
}
def query_summary_index(query: str, index: dict, top_k: int = 3) -> list[str]:
"""Retrieve the most relevant summaries for a query."""
query_vec = index["vectorizer"].transform([query])
similarities = cosine_similarity(query_vec, index["tfidf_matrix"]).flatten()
top_indices = np.argsort(similarities)[-top_k:][::-1]
return [index["summaries"][i] for i in top_indices]
Best Practices for Summarization RAG
- Chunk strategically: Align chunk boundaries with document sections (paragraphs, clauses).
- Include metadata: Store document title, date, and source alongside each summary.
- Use hybrid search: Combine semantic search (embeddings) with keyword search (BM25) for better retrieval.
- Re-rank results: After retrieval, use Claude to re-rank summaries by relevance to the query.
Evaluating Summary Quality
Automated evaluation helps you iterate quickly. Here's how to compute ROUGE scores:
from rouge_score import rouge_scorer
def evaluate_summary(reference: str, generated: str) -> dict:
"""Compute ROUGE-1, ROUGE-2, and ROUGE-L scores."""
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
scores = scorer.score(reference, generated)
return {
"rouge1_f1": scores['rouge1'].fmeasure,
"rouge2_f1": scores['rouge2'].fmeasure,
"rougeL_f1": scores['rougeL'].fmeasure
}
For more nuanced evaluation, use Promptfoo to define custom test cases:
# promptfoo_config.yaml
prompts:
- "Summarize this document: {{document}}"
- "Provide a bullet-point summary of key facts from: {{document}}"
tests:
- vars:
document: "file://sublease_agreement.txt"
assert:
- type: contains-all
value: ["rent", "term", "termination"]
- type: latency
threshold: 5000
Iterative Improvement
To systematically improve your summarization pipeline:
- Create a test set: Collect 10-20 documents with human-written reference summaries.
- Baseline: Run your current prompt and compute ROUGE scores.
- Experiment: Modify prompts (add examples, change structure, adjust focus areas).
- Compare: Use Promptfoo to run A/B tests between prompt variants.
- Analyze failures: Look at low-scoring summaries to identify patterns (missing details, hallucinations, poor structure).
- Refine: Update prompts based on failure analysis and repeat.
Conclusion and Best Practices
Summarization with Claude is both powerful and flexible. Here are the key principles to keep in mind:
- Be specific in your prompts: Generic prompts yield generic summaries. Specify format, length, focus areas, and audience.
- Use structured outputs: Request JSON or markdown tables for metadata extraction.
- Chunk wisely: For long documents, chunk at natural boundaries and use meta-summarization.
- Evaluate systematically: Combine automated metrics (ROUGE) with custom assertions (Promptfoo) and human review.
- Iterate: Summarization is rarely perfect on the first try. Build a feedback loop.
Key Takeaways
- Claude's large context window and instruction-following ability make it ideal for summarization, especially for complex documents like legal contracts.
- Guided prompts with specific focus areas and output formats produce significantly better summaries than open-ended requests.
- For documents exceeding context limits, use a chunk-and-meta-summarize strategy to preserve information across the entire text.
- RAG-based summarization with pre-indexed summaries enables fast retrieval from large document collections.
- Combine ROUGE scores with custom evaluation frameworks like Promptfoo to systematically measure and improve summary quality over time.