Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Learn how to summarize long documents with Claude AI, including legal texts. Covers prompt engineering, metadata extraction, handling token limits, ROUGE evaluation, and iterative improvement.
This guide teaches you to summarize documents with Claude using basic prompts, guided summarization, meta-summarization, and RAG. You'll learn to extract metadata, handle long documents, evaluate quality with ROUGE scores, and iteratively improve results.
Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher sifting through papers, or a product manager reviewing customer feedback, the ability to distill lengthy documents into concise, accurate summaries saves time and unlocks insights.
This guide walks you through the complete workflow of document summarization using Claude—from a simple one-shot prompt to advanced techniques like guided summarization, meta-summarization, and Retrieval-Augmented Generation (RAG). We'll also cover how to evaluate and iteratively improve your summaries.
Why Claude for Summarization?
Claude excels at summarization because of its large context window (up to 200K tokens), nuanced understanding of language, and ability to follow complex instructions. It can handle entire books, legal contracts, or technical reports in a single pass, making it ideal for real-world summarization tasks.
Getting Started: Setup and Data Preparation
First, install the required packages:
pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo
You'll also need a Claude API key. Set it as an environment variable:
export ANTHROPIC_API_KEY="your-api-key-here"
Preparing Your Document
For this guide, we'll use a publicly available Sublease Agreement from the SEC's EDGAR system. You can also use any PDF or text blob.
import pypdf
def extract_text_from_pdf(pdf_path):
reader = pypdf.PdfReader(pdf_path)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
document_text = extract_text_from_pdf("sublease_agreement.pdf")
If you're working with a plain text string, simply define:
document_text = "Your long text here..."
Basic Summarization with Claude
Let's start with a simple summarization function. This is the foundation we'll build upon.
import anthropic
client = anthropic.Anthropic()
def summarize_text(text, max_tokens=500):
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=max_tokens,
system="You are an expert summarizer. Provide a concise summary of the key points.",
messages=[
{"role": "user", "content": f"Please summarize the following document:\n\n{text}"}
]
)
return response.content[0].text
summary = summarize_text(document_text)
print(summary)
This works, but it's basic. Notice we're already using Claude's system prompt and the assistant role effectively. As we progress, we'll add more structure.
Advanced Techniques
1. Guided Summarization
Instead of a generic summary, guide Claude to extract specific information. This is especially useful for legal documents where you need to find obligations, dates, and parties.
def guided_summarize(text):
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=800,
system="You are a legal document analyst. Extract structured information.",
messages=[
{"role": "user", "content": f"""
Analyze this legal document and provide:
- Parties involved
- Effective date and term
- Key obligations of each party
- Termination conditions
- Payment terms
- Any unusual clauses
Document:
{text}
"""}
]
)
return response.content[0].text
2. Domain-Specific Guided Summarization
Tailor the prompt to your domain. For medical documents, ask for diagnoses and treatments. For financial reports, ask for revenue and risk factors.
def financial_summarize(text):
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=800,
system="You are a financial analyst. Extract key financial metrics and risks.",
messages=[
{"role": "user", "content": f"""
Extract from this financial document:
- Revenue and profit figures
- Key growth drivers
- Risk factors mentioned
- Forward-looking statements
- Management's outlook
Document:
{text}
"""}
]
)
return response.content[0].text
3. Meta-Summarization (Summarizing the Summary)
For extremely long documents, you can use a hierarchical approach: summarize sections, then summarize those summaries.
def chunk_and_summarize(text, chunk_size=5000):
chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
chunk_summaries = []
for chunk in chunks:
summary = summarize_text(chunk, max_tokens=300)
chunk_summaries.append(summary)
# Now summarize the summaries
combined_summaries = "\n\n".join(chunk_summaries)
final_summary = summarize_text(combined_summaries, max_tokens=500)
return final_summary
Summary Indexed Documents: An Advanced RAG Approach
For massive document collections, combine summarization with RAG. Instead of indexing raw chunks, index summaries of those chunks. This improves retrieval quality because summaries are denser and more relevant.
from sentence_transformers import SentenceTransformer
import numpy as np
Create summary index
chunks = chunk_text(document_text, chunk_size=2000)
chunk_summaries = [summarize_text(chunk, max_tokens=100) for chunk in chunks]
Embed summaries for retrieval
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(chunk_summaries)
def query_summary_index(query, top_k=3):
query_embedding = model.encode([query])
similarities = np.dot(embeddings, query_embedding.T).flatten()
top_indices = np.argsort(similarities)[-top_k:][::-1]
# Return original chunks corresponding to top summaries
return [chunks[i] for i in top_indices]
Best Practices for Summarization RAG
- Chunk strategically: Align chunks with natural document boundaries (sections, paragraphs).
- Summarize before indexing: Summary vectors are more informative than raw text vectors.
- Include metadata: Store document name, date, and section with each summary.
- Use hybrid search: Combine semantic similarity with keyword matching for better recall.
Evaluating Summary Quality
Evaluation is the hardest part of summarization. Here are two practical methods:
1. ROUGE Scores
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) compares your summary to a reference summary.
from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
reference = "The sublease agreement is between Party A and Party B..."
candidate = "This agreement involves Party A and Party B..."
scores = scorer.score(reference, candidate)
print(f"ROUGE-1: {scores['rouge1'].fmeasure:.2f}")
print(f"ROUGE-2: {scores['rouge2'].fmeasure:.2f}")
print(f"ROUGE-L: {scores['rougeL'].fmeasure:.2f}")
2. Promptfoo for Automated Evaluation
Promptfoo allows you to define custom evaluation criteria and run them programmatically.// promptfoo config example
{
"prompts": ["Summarize this document: {{document}}"],
"providers": ["anthropic:claude-3-sonnet-20240229"],
"tests": [
{
"vars": {
"document": "..."
},
"assert": [
{
"type": "llm-rubric",
"value": "Does the summary include all key parties and dates?"
},
{
"type": "contains-all",
"value": ["Party A", "Party B", "effective date"]
}
]
}
]
}
Iterative Improvement
Summarization is rarely perfect on the first try. Use this feedback loop:
- Generate a summary
- Evaluate using ROUGE and/or human review
- Identify gaps: Missing information? Too verbose? Factual errors?
- Refine the prompt: Add instructions, examples, or constraints
- Repeat
# Version 1: Too verbose
prompt_v1 = "Summarize this document."
Version 2: Add length constraint
prompt_v2 = "Summarize this document in 3-5 bullet points."
Version 3: Add structure and examples
prompt_v3 = """
Summarize this legal document in exactly 5 bullet points:
- Parties involved
- Key dates
- Financial terms
- Obligations
- Termination conditions
Example output:
- Parties: Acme Corp (Landlord) and Beta Inc (Tenant)
- Dates: Effective Jan 1, 2024, Term 12 months
...
"""
Conclusion and Best Practices
- Start simple, then iterate: Begin with a basic prompt and refine based on evaluation.
- Use guided prompts for structured output: Especially for legal, financial, or medical documents.
- Handle long documents with chunking and meta-summarization: Don't exceed Claude's context window.
- Evaluate systematically: Combine automated metrics (ROUGE) with custom criteria (Promptfoo).
- Consider RAG for large collections: Index summaries, not raw text, for better retrieval.
Key Takeaways
- Guided prompts outperform generic ones: Specify exactly what information you need (parties, dates, obligations) for structured, actionable summaries.
- Chunking + meta-summarization handles any document length: Break long texts into sections, summarize each, then summarize the summaries.
- Summary-indexed RAG improves retrieval: Indexing summaries instead of raw chunks yields denser, more relevant search results.
- Evaluate with both ROUGE and custom criteria: ROUGE measures overlap; custom checks (via Promptfoo) catch factual accuracy and completeness.
- Iterate relentlessly: Small prompt refinements—adding examples, constraints, or structure—dramatically improve summary quality.