Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG Techniques
Learn how to summarize long documents with Claude AI, including prompt engineering, metadata extraction, handling token limits, and evaluating summary quality using ROUGE scores and Promptfoo.
This guide teaches you to summarize documents with Claude using basic prompts, guided summarization, meta-summarization, and RAG-based indexing. It covers evaluation with ROUGE scores and Promptfoo, plus iterative improvement strategies for legal and technical texts.
Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG Techniques
Summarization is one of the most powerful and practical applications of large language models like Claude. Whether you're a legal professional drowning in contracts, a researcher sifting through papers, or a business analyst reviewing quarterly reports, the ability to condense lengthy documents into actionable insights saves time and improves decision-making.
This guide walks you through the entire summarization workflow using Claude—from basic prompt crafting to advanced techniques like guided summarization, meta-summarization, and RAG-based indexing. We'll also cover how to evaluate and iteratively improve your summaries using automated metrics like ROUGE scores and tools like Promptfoo.
Why Summarization Is Hard (and Why Claude Excels)
Summarization evaluation is notoriously subjective. Unlike classification or extraction tasks, there's rarely a single "correct" summary. Different readers value different aspects: some want bullet-point brevity, others need narrative coherence, and legal professionals demand factual precision.
Claude's strength lies in its ability to follow nuanced instructions, handle long contexts (up to 200K tokens), and produce structured outputs. With the right prompts, you can tailor summaries to specific domains, audiences, and formats.
Setting Up Your Environment
Before diving in, install the required packages:
pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo
You'll also need a valid Anthropic API key. Set it as an environment variable:
export ANTHROPIC_API_KEY="sk-ant-..."
Data Preparation: Extracting Text from PDFs
Most real-world documents come as PDFs. Here's a Python function to extract and clean text:
import pypdf
def extract_text_from_pdf(pdf_path):
reader = pypdf.PdfReader(pdf_path)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
Example usage
text = extract_text_from_pdf("sublease_agreement.pdf")
For this guide, we'll use a publicly available Sublease Agreement from the SEC's EDGAR system. If you have your own document, simply replace the file path.
Basic Summarization with Claude
Let's start with a simple summarization function:
import anthropic
client = anthropic.Anthropic()
def summarize_basic(text, max_tokens=500):
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=max_tokens,
system="You are an expert summarizer. Provide a concise summary of the following document.",
messages=[
{"role": "user", "content": f"Please summarize this document:\n\n{text}"}
]
)
return response.content[0].text
This works, but it's naive. Claude doesn't know what kind of summary you want, what to emphasize, or what to omit. Let's improve it.
Multi-Shot Basic Summarization
A better approach is to provide examples of good summaries within the prompt. This technique, known as few-shot prompting, helps Claude understand your expectations:
def summarize_few_shot(text):
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=800,
system="You are a legal document summarizer.",
messages=[
{"role": "user", "content": "Summarize this contract clause:\n\nThe Lessee shall pay all utilities including electricity, water, gas, and internet directly to the respective service providers on or before the 15th of each month."},
{"role": "assistant", "content": "The lessee is responsible for paying electricity, water, gas, and internet bills directly to providers by the 15th of each month."},
{"role": "user", "content": f"Now summarize this document:\n\n{text}"}
]
)
return response.content[0].text
Advanced Techniques
Guided Summarization
Instead of a generic summary, guide Claude to extract specific information. This is especially useful for legal documents where you need to capture parties, dates, obligations, and termination clauses:
def guided_summarize(text):
prompt = f"""Please analyze this legal document and provide:
- Parties Involved: List all named parties.
- Effective Date: When does the agreement take effect?
- Key Obligations: Bullet list of each party's main responsibilities.
- Termination Conditions: How can the agreement be terminated?
- Financial Terms: Any fees, deposits, or payment schedules.
- Risk Factors: Any clauses that could pose legal or financial risk.
Document:\n{text}
"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1000,
system="You are a legal analyst. Extract structured metadata from contracts.",
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Domain-Specific Guided Summarization
For specialized fields like law, medicine, or finance, provide domain-specific instructions:
def legal_summarize(text):
prompt = f"""As a legal expert, summarize this contract with attention to:
- Governing Law: Which jurisdiction's laws apply?
- Dispute Resolution: Arbitration, mediation, or litigation?
- Indemnification: Who indemnifies whom and under what conditions?
- Confidentiality: Any non-disclosure obligations?
- Assignment: Can rights be transferred?
Document:\n{text}
"""
# ... call Claude API
Meta-Summarization: Handling Long Documents
When documents exceed Claude's context window (or your budget), break them into chunks, summarize each chunk, then summarize the summaries:
def chunk_and_summarize(text, chunk_size=10000):
chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
chunk_summaries = []
for chunk in chunks:
summary = summarize_basic(chunk, max_tokens=300)
chunk_summaries.append(summary)
# Now summarize the summaries
combined = "\n\n".join(chunk_summaries)
final_summary = summarize_basic(combined, max_tokens=500)
return final_summary
This meta-summarization approach preserves the context of the entire document while working within token limits.
Summary Indexed Documents: An Advanced RAG Approach
For even larger document collections, combine summarization with Retrieval-Augmented Generation (RAG):
- Chunk each document into segments.
- Summarize each chunk using Claude.
- Index both the original text and the summaries in a vector database.
- Retrieve relevant summaries first, then fetch full text for detailed answers.
# Pseudocode for RAG summarization
from sentence_transformers import SentenceTransformer
import chromadb
Step 1: Chunk and summarize
document_chunks = chunk_document(text, chunk_size=2000)
chunk_summaries = [summarize_basic(chunk) for chunk in document_chunks]
Step 2: Index both summaries and original text
collection = chromadb.Client().create_collection("documents")
for i, (chunk, summary) in enumerate(zip(document_chunks, chunk_summaries)):
collection.add(
documents=[chunk],
metadatas={"summary": summary},
ids=[f"chunk_{i}"]
)
Step 3: Retrieve and synthesize
query = "What are the termination conditions?"
results = collection.query(query_texts=[query], n_results=3)
Use Claude to synthesize answer from retrieved chunks
Best Practices for Summarization RAG
- Dual indexing: Store both the summary and the original text. The summary helps with retrieval relevance; the original text ensures factual accuracy.
- Overlapping chunks: Use 10-20% overlap between chunks to avoid cutting off important context.
- Metadata enrichment: Tag chunks with document type, date, parties involved, etc.
Evaluating Summary Quality
Automated evaluation is essential for iterative improvement. Two common methods are:
ROUGE Scores
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) compares generated summaries against reference summaries:
from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
scores = scorer.score(reference_summary, generated_summary)
print(f"ROUGE-1: {scores['rouge1'].fmeasure:.3f}")
print(f"ROUGE-2: {scores['rouge2'].fmeasure:.3f}")
print(f"ROUGE-L: {scores['rougeL'].fmeasure:.3f}")
Limitations: ROUGE measures n-gram overlap, not semantic quality. A summary can score high on ROUGE but be incoherent or factually wrong.
Promptfoo for Custom Evaluation
Promptfoo allows you to define custom evaluation criteria using LLM-as-judge:promptfoo eval --config summarization_config.yaml
Example configuration:
# summarization_config.yaml
prompts:
- "Summarize this document: {{document}}"
providers:
- anthropic:claude-3-5-sonnet-20241022
tests:
- vars:
document: "..."
assert:
- type: llm-rubric
value: "The summary must include all named parties and the effective date."
- type: cost
threshold: 0.01
Iterative Improvement
Improving summarization is a cycle:
- Generate a summary with your current prompt.
- Evaluate using both automated metrics and human review.
- Analyze failures: Is it missing key info? Too verbose? Factually wrong?
- Refine your prompt: Add examples, tighten instructions, specify format.
- Repeat.
- Add constraints: "Use exactly 3 bullet points. No more than 100 words."
- Specify audience: "Summarize for a non-technical executive."
- Request verification: "After summarizing, list any claims that you are uncertain about."
Conclusion and Best Practices
Summarization with Claude is both an art and a science. Here are the key takeaways:
- Start simple, then iterate: A basic prompt gets you 80% of the way. Refine based on evaluation.
- Use guided prompts for structured extraction, especially in legal or technical domains.
- Chunk and meta-summarize for long documents.
- Combine summarization with RAG for large document collections.
- Evaluate rigorously: Use ROUGE for baseline, but supplement with LLM-as-judge or human review.
- Tailor to your audience: A summary for a lawyer differs from one for a business executive.
Key Takeaways
- Guided prompts outperform generic ones: Specify exactly what information to extract (parties, dates, obligations) for structured, actionable summaries.
- Meta-summarization handles long documents: Chunk the text, summarize each chunk, then summarize the summaries to preserve context beyond token limits.
- RAG with summary indexing boosts retrieval: Index both summaries and original text to improve relevance and factual accuracy in large document collections.
- Evaluate with multiple methods: Combine ROUGE scores for baseline measurement with Promptfoo's LLM-as-judge for semantic and factual quality checks.
- Iterate relentlessly: Small prompt refinements—adding examples, specifying audience, requesting verification—dramatically improve summary quality over time.