Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Learn how to summarize long documents with Claude AI, including prompt engineering, metadata extraction, handling token limits, and evaluating summary quality using ROUGE scores and Promptfoo.
This guide teaches you how to use Claude for document summarization, covering basic prompts, multi-shot techniques, domain-specific guided summarization, and advanced RAG approaches. You'll also learn to evaluate summaries using ROUGE scores and Promptfoo, plus best practices for iterative improvement.
Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher sifting through papers, or a product manager analyzing customer feedback, the ability to condense lengthy documents into clear, actionable summaries can transform your workflow.
This guide walks you through the full spectrum of summarization techniques using Claude — from simple one-shot prompts to advanced Retrieval-Augmented Generation (RAG) approaches. We'll focus on practical, actionable methods you can implement today.
Why Summarization Is Hard (And Why Claude Excels)
Evaluating summary quality is notoriously subjective. Unlike a math problem with a single correct answer, a "good" summary depends on your audience, use case, and desired level of detail. Traditional metrics like ROUGE scores measure word overlap but miss nuance, coherence, and factual accuracy.
Claude's strengths — long context windows, nuanced understanding, and instruction-following — make it particularly well-suited for summarization. But even with a powerful model, your approach matters.
Getting Started: Setup and Data Preparation
First, install the required packages:
pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo
You'll also need a Claude API key. Set it as an environment variable:
export ANTHROPIC_API_KEY="your-api-key-here"
Preparing Your Document
For this guide, we'll use a publicly available Sublease Agreement from the SEC website. You can also use any PDF or raw text. Here's how to extract text from a PDF:
import pypdf
def extract_text_from_pdf(pdf_path):
with open(pdf_path, 'rb') as file:
reader = pypdf.PdfReader(file)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
Load your document
text = extract_text_from_pdf("sublease_agreement.pdf")
If you're working with plain text, just assign it directly:
text = "Your document text here..."
Basic Summarization: Your First Claude Summary
Let's start simple. Here's a basic summarization function using the Claude API:
import anthropic
client = anthropic.Anthropic()
def summarize_text(text, max_tokens=500):
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=max_tokens,
messages=[
{
"role": "user",
"content": f"Please summarize the following document:\n\n{text}"
}
]
)
return response.content[0].text
summary = summarize_text(text)
print(summary)
This works, but it's basic. Notice we're using the user role with a simple instruction. As we progress, we'll refine this approach significantly.
Multi-Shot Summarization: Providing Examples
One powerful technique is to provide Claude with examples of good summaries. This is called few-shot prompting:
def summarize_with_examples(text):
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=500,
messages=[
{
"role": "user",
"content": """
Here is an example of a good summary:
Document: "The quick brown fox jumps over the lazy dog. The dog was sleeping under a tree. The fox was looking for food."
Summary: "A fox searching for food jumps over a sleeping dog under a tree."
Now summarize this document:
"""
},
{
"role": "assistant",
"content": "Understood. I will follow that style."
},
{
"role": "user",
"content": text
}
]
)
return response.content[0].text
The key insight here is using the assistant role to acknowledge the instruction before receiving the actual document. This helps Claude "get in the right mindset" before processing your content.
Advanced Techniques: Guided and Domain-Specific Summarization
Guided Summarization
Instead of a generic "summarize this," guide Claude with specific instructions:
def guided_summarize(text):
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=800,
messages=[
{
"role": "user",
"content": f"""
Please summarize the following legal document. Focus on:
- Key parties involved
- Effective dates and duration
- Financial terms (rent, deposits, fees)
- Termination conditions
- Notable obligations or restrictions
Format your summary as bullet points under each heading.
Document:
{text}
"""
}
]
)
return response.content[0].text
Domain-Specific Guided Summarization
For legal documents specifically, you can create a structured extraction:
def legal_document_summary(text):
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1000,
messages=[
{
"role": "user",
"content": f"""
You are a legal document analyst. Extract the following metadata and create a structured summary:
METADATA:
- Document Type:
- Parties:
- Date Signed:
- Effective Date:
- Term/Duration:
SUMMARY SECTIONS:
- Executive Summary (3-5 sentences)
- Key Financial Terms
- Rights and Obligations
- Termination Clauses
- Risk Factors
Document:
{text}
"""
}
]
)
return response.content[0].text
Meta-Summarization: Handling Long Documents
When documents exceed Claude's context window (or your budget), use a chunk-and-summarize approach:
def chunk_text(text, chunk_size=4000):
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size):
chunk = ' '.join(words[i:i+chunk_size])
chunks.append(chunk)
return chunks
def meta_summarize(text):
chunks = chunk_text(text)
chunk_summaries = []
for chunk in chunks:
summary = summarize_text(chunk, max_tokens=300)
chunk_summaries.append(summary)
# Now summarize the summaries
combined = "\n\n".join(chunk_summaries)
final_summary = summarize_text(combined, max_tokens=500)
return final_summary
Summary Indexed Documents: An Advanced RAG Approach
For truly large document collections, combine summarization with RAG. The idea is to create a summary index — a searchable database of document summaries:
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
def create_summary_index(documents):
"""Create a searchable index of document summaries."""
summaries = []
for doc in documents:
summary = summarize_text(doc, max_tokens=200)
summaries.append(summary)
# Create TF-IDF vectors for retrieval
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(summaries)
return summaries, vectorizer, tfidf_matrix
def query_summary_index(query, summaries, vectorizer, tfidf_matrix, top_k=3):
"""Retrieve most relevant summaries for a query."""
query_vec = vectorizer.transform([query])
similarities = cosine_similarity(query_vec, tfidf_matrix)[0]
top_indices = np.argsort(similarities)[-top_k:][::-1]
results = []
for idx in top_indices:
results.append({
"summary": summaries[idx],
"relevance": similarities[idx]
})
return results
Best Practices for Summarization RAG
- Chunk strategically: Align chunks with document structure (paragraphs, sections)
- Preserve metadata: Include document title, date, and source in each chunk
- Use overlapping chunks: 10-20% overlap prevents information loss at boundaries
- Cache summaries: Store generated summaries to avoid redundant API calls
Evaluating Summary Quality
Automated evaluation helps you iterate quickly. Here's how to use ROUGE scores:
from rouge_score import rouge_scorer
def evaluate_summary(reference, generated):
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
scores = scorer.score(reference, generated)
print("ROUGE-1:", scores['rouge1'].fmeasure)
print("ROUGE-2:", scores['rouge2'].fmeasure)
print("ROUGE-L:", scores['rougeL'].fmeasure)
return scores
For more nuanced evaluation, use Promptfoo to create custom evaluation suites:
# promptfooconfig.yaml
evaluation:
prompts:
- "Summarize this document: {{document}}"
providers:
- anthropic:claude-3-opus-20240229
tests:
- vars:
document: "Your test document here"
assert:
- type: contains-all
value: ["key term 1", "key term 2"]
- type: max-length
value: 500
Iterative Improvement: A Practical Workflow
- Baseline: Start with a basic prompt, generate summaries
- Evaluate: Use ROUGE scores and manual spot-checks
- Identify gaps: Where does the summary miss key information?
- Refine prompts: Add specific instructions for missed areas
- Test edge cases: Try with different document types and lengths
- Automate: Create a test suite with Promptfoo for regression testing
Conclusion and Best Practices
- Be specific in your prompts: Generic "summarize this" yields generic results. Specify format, length, and focus areas.
- Use the assistant role: Acknowledge instructions before providing content to improve output quality.
- Chunk strategically: For long documents, use overlapping chunks and meta-summarization.
- Evaluate systematically: Combine automated metrics (ROUGE) with human review.
- Iterate: Summarization is rarely perfect on the first try. Build a feedback loop.
- Consider your audience: A summary for a legal expert differs from one for a general reader.
Key Takeaways
- Start with guided prompts: Specify exactly what information you need extracted rather than asking for a generic summary
- Use multi-shot prompting with assistant role: Providing examples and using the assistant role improves output consistency
- Chunk and meta-summarize for long documents: Break documents into overlapping chunks, summarize each, then summarize the summaries
- Combine summarization with RAG: Create summary indexes for large document collections to enable fast, relevant retrieval
- Evaluate with multiple methods: Use ROUGE scores for automated checks and Promptfoo for custom evaluation suites