Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Learn how to summarize complex documents using Claude AI. This guide covers prompt engineering, metadata extraction, handling long texts, and evaluation techniques using ROUGE scores and Promptfoo.
This guide teaches you to summarize documents with Claude, from basic prompts to advanced techniques like guided summarization, meta-summarization, and RAG-based indexing. You'll learn to extract metadata, handle long documents, and evaluate summary quality using ROUGE scores and Promptfoo.
Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
In today's information-dense world, the ability to distill lengthy documents into concise, actionable summaries is invaluable. Whether you're a legal professional parsing contracts, a researcher reviewing papers, or a business analyst synthesizing reports, Claude's summarization capabilities can dramatically reduce your cognitive load.
This guide walks you through practical, battle-tested techniques for summarizing documents with Claude. We'll start with the basics and progressively build toward advanced methods, including handling documents that exceed token limits and implementing Retrieval-Augmented Generation (RAG) for large-scale summarization.
Why Summarization Is Hard (and Why Claude Excels)
Summarization is deceptively difficult. Unlike classification or extraction tasks, there's rarely a single "correct" summary. Quality depends on context, audience, and purpose. A summary for a legal team differs vastly from one for executives.
Claude excels here because:
- Long context window: Handles documents up to 200K tokens
- Nuanced understanding: Grasps legal jargon, technical terms, and domain-specific language
- Controllable output: You can guide the style, length, and focus of summaries
Setting Up Your Environment
Before diving in, let's set up the necessary tools. You'll need:
pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo
And an Anthropic API key:
import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
Data Preparation: Extracting Text from PDFs
Most real-world documents come as PDFs. Here's a robust function to extract and clean text:
import pypdf
def extract_text_from_pdf(pdf_path):
with open(pdf_path, 'rb') as file:
reader = pypdf.PdfReader(file)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
def clean_text(text):
# Remove excessive whitespace
import re
text = re.sub(r'\s+', ' ', text)
# Remove non-ASCII characters if needed
text = text.encode('ascii', 'ignore').decode()
return text.strip()
Example usage
text = extract_text_from_pdf("sublease_agreement.pdf")
text = clean_text(text)
For quick testing, you can also paste text directly:
text = """Your document text here..."""
Basic Summarization: Your First Claude Prompt
Let's start with a straightforward summarization call:
def summarize_with_claude(text, max_tokens=500):
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=max_tokens,
messages=[
{
"role": "user",
"content": f"Please provide a concise summary of the following document:\n\n{text}"
}
]
)
return response.content[0].text
summary = summarize_with_claude(text)
print(summary)
Important: Notice we're using the user role with a clear instruction. This is the foundation. But for production, you'll want more control.
Multi-Shot Summarization: Providing Examples
Claude performs better when you show it what "good" looks like. This is called few-shot prompting:
def summarize_with_examples(text):
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=500,
messages=[
{
"role": "user",
"content": """I will show you a document and an example of a good summary. Then I'll ask you to summarize a new document.
Example Document:
[Short example document]
Example Good Summary:
[Corresponding summary]
Now summarize this document:
""" + text
}
]
)
return response.content[0].text
This technique dramatically improves consistency, especially for domain-specific content.
Advanced Techniques: Guided and Domain-Specific Summarization
Guided Summarization
Instead of a generic "summarize this," guide Claude with specific instructions:
def guided_summarize(text, instructions):
prompt = f"""Summarize the following document according to these instructions:
INSTRUCTIONS:
{instructions}
DOCUMENT:
{text}
SUMMARY:"""
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=800,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Example: Legal document focus
instructions = """
- Identify all parties involved
- List key dates and deadlines
- Highlight termination clauses
- Note any financial obligations
- Keep under 300 words
"""
summary = guided_summarize(text, instructions)
Domain-Specific Guided Summarization
For legal documents, you can add domain knowledge:
def legal_summarize(text):
prompt = f"""You are a legal document analyst. Summarize this contract with:
- Parties and their roles
- Key obligations for each party
- Termination conditions
- Liability and indemnification clauses
- Governing law and jurisdiction
- Any unusual or high-risk provisions
Document:
{text}
Legal Summary:"""
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1000,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Meta-Summarization: Handling Long Documents
When documents exceed Claude's context window, use a chunk-and-merge strategy:
def chunk_text(text, chunk_size=50000):
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size):
chunk = ' '.join(words[i:i+chunk_size])
chunks.append(chunk)
return chunks
def meta_summarize(text):
chunks = chunk_text(text)
# Step 1: Summarize each chunk
chunk_summaries = []
for i, chunk in enumerate(chunks):
summary = summarize_with_claude(chunk, max_tokens=300)
chunk_summaries.append(summary)
print(f"Processed chunk {i+1}/{len(chunks)}")
# Step 2: Combine chunk summaries
combined = "\n\n".join(chunk_summaries)
# Step 3: Final summary of summaries
final_summary = summarize_with_claude(
f"Combine these section summaries into a coherent overall summary:\n\n{combined}",
max_tokens=800
)
return final_summary
This hierarchical approach preserves context while staying within token limits.
Summary Indexed Documents: An Advanced RAG Approach
For massive document collections, use a RAG (Retrieval-Augmented Generation) approach where you index summaries:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
class SummaryIndex:
def __init__(self):
self.documents = []
self.summaries = []
self.vectorizer = TfidfVectorizer()
def add_document(self, doc_id, text):
summary = summarize_with_claude(text, max_tokens=200)
self.documents.append({"id": doc_id, "text": text})
self.summaries.append({"id": doc_id, "summary": summary})
def search(self, query, top_k=3):
# Vectorize summaries
summary_texts = [s["summary"] for s in self.summaries]
vectors = self.vectorizer.fit_transform(summary_texts + [query])
# Find most similar
similarities = cosine_similarity(vectors[-1:], vectors[:-1])[0]
top_indices = np.argsort(similarities)[-top_k:][::-1]
return [self.summaries[i] for i in top_indices]
def query(self, question):
relevant = self.search(question)
context = "\n\n".join([r["summary"] for r in relevant])
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=500,
messages=[{
"role": "user",
"content": f"Based on these document summaries, answer: {question}\n\nContext:\n{context}"
}]
)
return response.content[0].text
Best Practices for Summarization RAG
- Chunk strategically: Split documents at natural boundaries (paragraphs, sections)
- Index summaries, not raw text: Summaries are more searchable
- Use hybrid search: Combine semantic and keyword-based retrieval
- Cache summaries: Avoid regenerating summaries for the same documents
Evaluating Summary Quality
Evaluation is critical but challenging. Here's a practical approach:
ROUGE Score Evaluation
from rouge_score import rouge_scorer
def evaluate_summary(reference, generated):
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
scores = scorer.score(reference, generated)
print("ROUGE-1:", scores['rouge1'].fmeasure)
print("ROUGE-2:", scores['rouge2'].fmeasure)
print("ROUGE-L:", scores['rougeL'].fmeasure)
return scores
Custom Evaluation with Promptfoo
Promptfoo allows you to define custom evaluation criteria:# promptfoo config
prompts:
- "Summarize this document: {{text}}"
- "Provide a concise summary focusing on key points: {{text}}"
tests:
- vars:
text: "Your test document here"
assert:
- type: contains-all
value: ["parties", "obligations", "termination"]
- type: max-length
value: 500
- type: python
value: |
# Custom check for factual accuracy
def check_facts(output):
return "parties" in output.lower()
Iterative Improvement: A Practical Workflow
- Baseline: Start with a simple prompt
- Evaluate: Use ROUGE scores and human review
- Identify gaps: Is it missing key information? Too verbose?
- Refine prompt: Add instructions, examples, or constraints
- Re-evaluate: Compare against baseline
- Repeat: Until quality meets your threshold
Conclusion and Best Practices
- Start simple, then layer: Begin with basic prompts, then add guidance
- Use examples: Few-shot prompting dramatically improves consistency
- Handle long documents: Use chunk-and-merge or RAG approaches
- Evaluate systematically: Combine automated metrics with human review
- Iterate: Summarization is rarely perfect on the first try
- Domain-specific prompts: Tailor instructions to your content type
- Monitor token usage: Long documents can be expensive; optimize chunk sizes
Key Takeaways
- Guided prompts outperform generic ones: Specific instructions yield summaries that match your needs
- Chunk-and-merge handles any document length: Meta-summarization preserves context across large documents
- RAG with summary indexing scales to document collections: Search summaries, not raw text
- Evaluation requires multiple metrics: Combine ROUGE scores with custom checks for factual accuracy and completeness
- Iterative refinement is essential: Treat summarization as an evolving process, not a one-shot task