Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Learn how to summarize complex documents using Claude AI. This guide covers prompt engineering, metadata extraction, handling long texts, ROUGE evaluation, and RAG-based summarization.
This guide teaches you how to use Claude for effective document summarization, covering basic prompts, advanced techniques like guided and meta-summarization, handling long documents, and evaluating summary quality using ROUGE scores and Promptfoo.
Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher sifting through papers, or a business analyst reviewing quarterly reports, the ability to distill lengthy documents into concise, actionable summaries is invaluable.
This guide is a practical walkthrough of how to use Claude for document summarization. We'll start with the basics and progressively build up to advanced techniques, including guided summarization, metadata extraction, handling documents beyond token limits, and even a Retrieval-Augmented Generation (RAG) approach. We'll also cover how to evaluate your summaries using both automated metrics and custom evaluation frameworks.
By the end, you'll have a complete toolkit for building robust summarization workflows with Claude.
Why Summarization is Hard (and Why Claude Excels)
Summarization is notoriously difficult to evaluate. Unlike classification or extraction tasks, there's rarely a single "correct" summary. Different readers value different things: a lawyer needs precise legal language, while a business executive wants the bottom line. Traditional metrics like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measure n-gram overlap with reference summaries but fail to capture coherence, factual accuracy, or relevance.
Claude excels here because of its strong instruction-following capabilities and large context window (up to 200K tokens). This allows you to:
- Provide detailed instructions about what to include or exclude
- Process entire documents in a single pass
- Extract structured metadata alongside free-form summaries
Getting Started: Setup and Data Preparation
First, let's set up our environment. You'll need an Anthropic API key and a few Python packages.
# Install required packages
pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo
import anthropic
from pypdf import PdfReader
import pandas as pd
Initialize the Claude client
client = anthropic.Anthropic(api_key="YOUR_API_KEY")
Extracting Text from PDFs
For this guide, we'll use a publicly available legal document—a Sublease Agreement from the SEC's EDGAR database. Here's how to extract text from a PDF:
def extract_text_from_pdf(pdf_path):
reader = PdfReader(pdf_path)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
Load your document
text = extract_text_from_pdf("sublease_agreement.pdf")
If you don't have a PDF, just use a text blob:
text = "Your document text here..."
Basic Summarization with Claude
Let's start with a simple summarization function. This is the foundation we'll build upon.
def summarize_text(text, max_tokens=500):
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=max_tokens,
system="You are an expert document summarizer. Create a concise, accurate summary that captures the key points.",
messages=[
{"role": "user", "content": f"Please summarize the following document:\n\n{text}"}
]
)
return response.content[0].text
summary = summarize_text(text)
print(summary)
This works, but it's basic. The summary will be generic and may miss important details specific to your use case. Let's improve it.
Advanced Techniques for Better Summaries
1. Guided Summarization
Instead of a generic request, guide Claude with specific instructions. This is where prompt engineering shines.
def guided_summarize(text, focus_areas=None, output_format="paragraph"):
if focus_areas is None:
focus_areas = ["key terms", "obligations", "dates", "parties"]
prompt = f"""Please summarize the following legal document. Focus specifically on:
- {', '.join(focus_areas)}
Output format: {output_format}
Document:
{text}
"""
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=800,
system="You are a legal document analyst. Provide precise, structured summaries.",
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
2. Domain-Specific Guided Summarization
For legal documents, we can go further by extracting structured metadata alongside the summary.
def legal_document_summary(text):
prompt = f"""Analyze this legal document and provide:
- SUMMARY: A 3-4 sentence overview
- PARTIES: List all named parties and their roles
- KEY DATES: All important dates (effective date, termination, renewal, etc.)
- OBLIGATIONS: Key obligations for each party
- RISK FACTORS: Any unusual or potentially unfavorable terms
- TERMINATION: Conditions for termination
Document:
{text}
"""
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1000,
system="You are a senior legal analyst. Extract all relevant information with precision.",
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
3. Handling Long Documents with Meta-Summarization
What if your document exceeds Claude's context window? Use a chunk-and-summarize approach, then summarize the summaries.
def chunk_text(text, chunk_size=50000):
"""Split text into chunks of roughly equal size."""
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size):
chunk = " ".join(words[i:i+chunk_size])
chunks.append(chunk)
return chunks
def meta_summarize(text, max_tokens=1000):
# Step 1: Chunk the document
chunks = chunk_text(text)
# Step 2: Summarize each chunk
chunk_summaries = []
for i, chunk in enumerate(chunks):
summary = summarize_text(chunk, max_tokens=300)
chunk_summaries.append(f"Section {i+1}: {summary}")
# Step 3: Summarize the summaries
combined = "\n\n".join(chunk_summaries)
final_prompt = f"""Combine these section summaries into a coherent overall summary of the document.
Ensure no key information is lost.
{combined}
"""
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=max_tokens,
system="You are an expert at synthesizing information from multiple sources.",
messages=[{"role": "user", "content": final_prompt}]
)
return response.content[0].text
Advanced RAG Approach: Summary-Indexed Documents
For very large document collections, consider a RAG approach where you index document summaries rather than raw text. This is more efficient and often produces better results.
# Pseudocode for Summary-Indexed RAG
class SummaryRAG:
def __init__(self):
self.summaries = []
self.documents = []
def add_document(self, doc_id, text):
summary = summarize_text(text, max_tokens=200)
self.summaries.append({"id": doc_id, "summary": summary})
self.documents.append({"id": doc_id, "text": text})
def query(self, question, top_k=3):
# Find relevant summaries using embedding similarity
relevant_summaries = self.search_summaries(question, top_k)
# Retrieve full text for relevant documents
context = "\n\n".join([
self.get_document(s["id"]) for s in relevant_summaries
])
# Generate answer using Claude
return self.generate_answer(question, context)
Best Practices for Summarization RAG
- Summarize at multiple granularities: Create both short (1-2 sentence) and detailed (paragraph) summaries
- Include metadata: Always tag summaries with document source, date, and type
- Use hierarchical indexing: For very long documents, create section-level summaries
- Validate summaries: Periodically check that summaries accurately represent source documents
Evaluating Summary Quality
Evaluation is critical. Here are three approaches:
1. ROUGE Scores
ROUGE measures n-gram overlap between generated and reference summaries.
from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
scores = scorer.score(reference_summary, generated_summary)
print(f"ROUGE-1: {scores['rouge1'].fmeasure:.3f}")
print(f"ROUGE-2: {scores['rouge2'].fmeasure:.3f}")
print(f"ROUGE-L: {scores['rougeL'].fmeasure:.3f}")
2. Custom Evaluation with Promptfoo
Promptfoo allows you to define custom evaluation criteria. For example:# promptfoo config
prompts:
- "Summarize this legal document: {{document}}"
tests:
- vars:
document: "file://sublease_agreement.pdf"
assert:
- type: contains-all
value:
- "effective date"
- "termination"
- "rent"
- type: llm-rubric
value: "Does the summary accurately capture all key obligations of both parties?"
3. Human Evaluation Checklist
For production systems, use a structured checklist:
- ✅ Factual accuracy (no hallucinations)
- ✅ Completeness (covers all key points)
- ✅ Conciseness (no unnecessary detail)
- ✅ Coherence (flows logically)
- ✅ Relevance (matches the intended use case)
Iterative Improvement Process
- Generate baseline summaries using the basic approach
- Evaluate using ROUGE and custom criteria
- Identify weaknesses (e.g., missing dates, unclear obligations)
- Refine prompts to address specific gaps
- Re-evaluate and compare scores
- Repeat until quality meets your threshold
Conclusion and Best Practices
Here are the key takeaways for building robust summarization systems with Claude:
- Prompt engineering is everything: Be specific about what you want. Guide Claude with examples and structured output formats.
- Use metadata extraction: Don't just summarize—extract structured data like dates, parties, and obligations.
- Handle long documents strategically: Chunk and meta-summarize, or use a RAG approach with summary indexing.
- Evaluate rigorously: Combine automated metrics (ROUGE) with custom evaluation and human review.
- Iterate: Summarization is rarely perfect on the first try. Use evaluation results to refine your prompts and approach.
Key Takeaways
- Start with guided prompts: Generic summaries are rarely useful. Provide specific instructions about what to include, exclude, and how to format the output.
- Extract metadata alongside summaries: For legal or technical documents, structured extraction (dates, parties, obligations) adds enormous value beyond free-form summaries.
- Use meta-summarization for long documents: Chunk the document, summarize each chunk, then summarize the summaries. This preserves information while staying within token limits.
- Evaluate with multiple methods: Combine ROUGE scores, custom evaluation frameworks like Promptfoo, and human review to ensure summary quality.
- Iterate based on evaluation: Use evaluation results to refine prompts and techniques. Small prompt changes can yield significant improvements.