Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Learn practical techniques for summarizing long documents with Claude AI, including prompt engineering, metadata extraction, handling token limits, and evaluating summary quality using ROUGE scores and Promptfoo.
Learn how to use Claude for document summarization, from basic prompts to advanced techniques like guided summarization, meta-summarization, and RAG-based indexing. Includes practical code examples and evaluation methods.
Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher sifting through papers, or a product manager parsing customer feedback, the ability to condense lengthy documents into actionable insights saves time and improves decision-making.
Claude excels at summarization tasks thanks to its large context window, nuanced understanding of language, and ability to follow complex instructions. In this guide, you'll learn practical, actionable techniques for summarizing documents with Claude—from simple prompts to advanced retrieval-augmented generation (RAG) approaches.
We'll focus on real-world challenges: handling long documents, extracting structured metadata, evaluating summary quality, and iteratively improving your results. Code examples are provided in Python, but the concepts apply to any programming language.
Why Summarization Is Hard (and Why Claude Helps)
Evaluating summary quality is notoriously subjective. What one reader considers a perfect summary, another may find lacking. Traditional metrics like ROUGE scores measure word overlap with reference summaries but miss nuances like coherence, factual accuracy, and relevance to the reader's goals.
Claude helps overcome these challenges by:
- Understanding context and intent beyond simple keyword matching
- Following detailed instructions about summary structure and focus
- Handling documents up to 100K+ tokens in a single call
- Generating summaries in specific formats (bullet points, structured data, executive briefs)
Setup and Data Preparation
First, install the required packages:
pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo
You'll also need a Claude API key. Set it as an environment variable:
export ANTHROPIC_API_KEY="your-api-key-here"
Extracting Text from Documents
For this guide, we'll use a publicly available Sublease Agreement from the SEC website. Here's how to extract text from a PDF:
import pypdf
def extract_text_from_pdf(pdf_path):
with open(pdf_path, 'rb') as file:
reader = pypdf.PdfReader(file)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
Load your document
text = extract_text_from_pdf("sublease_agreement.pdf")
If you're working with plain text, simply define text = "your content here".
Basic Summarization with Claude
Let's start with a simple summarization function. This foundational approach uses Claude's instruction-following capabilities effectively:
import anthropic
client = anthropic.Anthropic()
def summarize_document(text, max_tokens=1000):
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=max_tokens,
system="You are an expert document summarizer. Create concise, accurate summaries that capture key information.",
messages=[
{
"role": "user",
"content": f"Please summarize the following document:\n\n{text}"
}
]
)
return response.content[0].text
summary = summarize_document(text)
print(summary)
This works well for short documents, but for longer texts you'll need more sophisticated approaches.
Advanced Techniques for Better Summaries
1. Guided Summarization
Instead of a generic "summarize this" prompt, guide Claude with specific instructions about what to extract:
def guided_summarize(text, focus_areas):
prompt = f"""Please analyze the following legal document and provide:
- A one-paragraph executive summary
- Key parties involved and their roles
- Important dates and deadlines
- Financial terms and obligations
- Termination conditions
Focus particularly on: {', '.join(focus_areas)}
Document:
{text}
"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1500,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Example usage
focus = ["indemnification clauses", "payment terms", "termination rights"]
summary = guided_summarize(text, focus)
2. Domain-Specific Guided Summarization
For legal documents, you can create specialized prompts that extract structured metadata:
def legal_document_summary(text):
prompt = f"""You are a legal document analyst. Extract the following information from this contract:
- Document Type (e.g., Sublease, NDA, Service Agreement)
- Effective Date
- Parties Involved
- Key Obligations (list up to 5)
- Financial Terms (amounts, payment schedule, late fees)
- Termination Clause Summary
- Governing Law
- Any Unusual or High-Risk Clauses
Format your response as structured markdown with clear headings.
Document:
{text}
"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2000,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
3. Meta-Summarization for Long Documents
When documents exceed Claude's context window, use a chunk-and-summarize approach:
def chunk_text(text, chunk_size=50000):
"""Split text into overlapping chunks."""
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - 5000): # 5000 word overlap
chunk = ' '.join(words[i:i + chunk_size])
chunks.append(chunk)
return chunks
def meta_summarize(text):
# Step 1: Summarize each chunk
chunks = chunk_text(text)
chunk_summaries = []
for i, chunk in enumerate(chunks):
summary = summarize_document(chunk, max_tokens=500)
chunk_summaries.append(summary)
print(f"Processed chunk {i+1}/{len(chunks)}")
# Step 2: Summarize the summaries
combined = "\n\n---\n\n".join(chunk_summaries)
final_summary = summarize_document(
f"Combine these section summaries into a coherent overall summary:\n\n{combined}",
max_tokens=1000
)
return final_summary
final = meta_summarize(text)
Summary-Indexed RAG: An Advanced Approach
For large document collections, combine summarization with retrieval-augmented generation (RAG):
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
class SummaryIndex:
def __init__(self, documents):
self.documents = documents
self.summaries = []
self.vectorizer = TfidfVectorizer(stop_words='english')
def build_index(self):
# Generate summaries for each document
for doc in self.documents:
summary = summarize_document(doc, max_tokens=200)
self.summaries.append(summary)
# Create TF-IDF matrix from summaries
self.tfidf_matrix = self.vectorizer.fit_transform(self.summaries)
def query(self, question, top_k=3):
# Vectorize the question
question_vec = self.vectorizer.transform([question])
# Find most relevant summaries
similarities = cosine_similarity(question_vec, self.tfidf_matrix)[0]
top_indices = np.argsort(similarities)[-top_k:][::-1]
# Return relevant documents
results = []
for idx in top_indices:
results.append({
'summary': self.summaries[idx],
'document': self.documents[idx],
'relevance': similarities[idx]
})
return results
Usage
index = SummaryIndex([doc1, doc2, doc3])
index.build_index()
results = index.query("What are the termination conditions?")
Best Practices for Summarization RAG
- Summarize before indexing: Store summaries alongside full documents for faster retrieval
- Use hierarchical summaries: Create section-level summaries for very long documents
- Include metadata: Tag summaries with document type, date, and key entities
- Update incrementally: Re-summarize only changed documents, not the entire corpus
Evaluating Summary Quality
Automated evaluation helps you iterate and improve. Here's how to use ROUGE scores:
from rouge_score import rouge_scorer
def evaluate_summary(reference, generated):
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
scores = scorer.score(reference, generated)
print("ROUGE-1:", scores['rouge1'].fmeasure)
print("ROUGE-2:", scores['rouge2'].fmeasure)
print("ROUGE-L:", scores['rougeL'].fmeasure)
return scores
For more nuanced evaluation, use Promptfoo to create custom test cases:
# Install Promptfoo
npx promptfoo@latest init
Create evaluation config
cat > promptfooconfig.yaml << EOF
evaluation:
prompts:
- "Summarize this document: {{document}}"
providers:
- id: anthropic:claude-3-5-sonnet-20241022
tests:
- vars:
document: "path/to/test_doc.txt"
assert:
- type: contains-any
value: ["key term 1", "key term 2"]
- type: latency
threshold: 5000
EOF
Run evaluation
npx promptfoo@latest eval
Iterative Improvement Strategy
Follow this cycle to refine your summarization:
- Baseline: Start with a simple prompt and evaluate
- Analyze failures: Where does the summary miss important details?
- Refine prompts: Add specific instructions, examples, or constraints
- Test edge cases: Try different document types and lengths
- Automate evaluation: Use ROUGE + custom checks in CI/CD
Conclusion and Best Practices
Summarization with Claude is both powerful and flexible. Here are key takeaways:
- Be specific in your prompts: Tell Claude exactly what you want (structure, length, focus areas)
- Use guided summarization for structured output: Extract metadata alongside narrative summaries
- Chunk long documents strategically: Overlap chunks to maintain context
- Evaluate both automatically and manually: ROUGE scores catch some issues, but human review catches nuance
- Iterate based on your use case: The perfect summary for a legal team differs from one for executives
Key Takeaways
- Prompt specificity matters: Generic "summarize this" prompts produce generic results. Guide Claude with detailed instructions about structure, focus areas, and output format.
- Handle long documents with chunking and meta-summarization: Break documents into overlapping chunks, summarize each, then summarize the summaries for coherent long-document understanding.
- Combine summarization with RAG for scalable knowledge retrieval: Index document summaries for fast, relevant retrieval across large document collections.
- Evaluate summaries using multiple methods: Use ROUGE scores for automated checks and Promptfoo for custom assertions, but always supplement with human review for nuanced quality.
- Iterate systematically: Start simple, identify failure modes, refine prompts, and automate evaluation to continuously improve your summarization pipeline.