Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Learn how to summarize complex documents using Claude AI. Covers prompt engineering, metadata extraction, handling long texts, ROUGE evaluation, and advanced RAG techniques.
This guide teaches you to build effective summarization workflows with Claude, including crafting prompts, handling long documents via chunking, extracting metadata, and evaluating summary quality using ROUGE scores and custom methods.
Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Summarization is one of the most powerful—and most requested—capabilities in the Claude AI ecosystem. Whether you're a legal professional drowning in contracts, a researcher scanning dozens of papers, or a product manager synthesizing customer feedback, the ability to condense lengthy documents into concise, actionable summaries is invaluable.
In this guide, we'll walk through a complete summarization workflow using Claude, starting with basic prompts and progressing to advanced techniques like guided summarization, meta-summarization, and Retrieval-Augmented Generation (RAG) for indexed documents. We'll also cover how to evaluate and iteratively improve your summaries.
Why Summarization is Hard (and Why Claude Excels)
Summarization is notoriously difficult to evaluate. Unlike classification or extraction tasks, there's rarely a single "correct" summary. Different readers value different things: a lawyer needs precise legal language preserved, while an executive wants high-level business impact. Claude's ability to follow nuanced instructions and adapt its output to specific contexts makes it ideal for this challenge.
Setting Up Your Environment
Before diving in, let's get your environment ready. You'll need:
- An Anthropic API key
- Python 3.8+
- The following packages:
pip install anthropic pypdf pandas matplotlib scikit-learn numpy rouge-score nltk seaborn promptfoo
Initialize your Claude client:
import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
Data Preparation: From PDF to Clean Text
Most real-world documents come as PDFs. Here's how to extract and clean text for Claude:
import pypdf
def extract_text_from_pdf(pdf_path):
reader = pypdf.PdfReader(pdf_path)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
def clean_text(text):
# Remove excessive whitespace
import re
text = re.sub(r'\s+', ' ', text)
return text.strip()
Example usage
raw_text = extract_text_from_pdf("sublease_agreement.pdf")
clean_text = clean_text(raw_text)
For testing, you can also just define a string directly:
text = "Your document text here..."
Basic Summarization with Claude
Let's start simple. Here's a basic summarization function:
def summarize_text(text, max_tokens=500):
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=max_tokens,
system="You are an expert summarizer. Provide concise, accurate summaries.",
messages=[
{
"role": "user",
"content": f"Please summarize the following document:\n\n{text}"
}
]
)
return response.content[0].text
summary = summarize_text(clean_text)
print(summary)
This works, but it's limited. Claude's context window handles up to 200K tokens, but for truly long documents, you'll need chunking strategies.
Multi-Shot Summarization for Long Documents
When a document exceeds Claude's context window, break it into chunks, summarize each, then summarize the summaries:
def chunk_text(text, chunk_size=50000):
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size):
chunk = ' '.join(words[i:i+chunk_size])
chunks.append(chunk)
return chunks
def multi_shot_summarize(text):
chunks = chunk_text(text)
chunk_summaries = []
for i, chunk in enumerate(chunks):
summary = summarize_text(chunk, max_tokens=300)
chunk_summaries.append(summary)
print(f"Chunk {i+1}/{len(chunks)} summarized")
# Summarize the summaries
combined = "\n\n".join(chunk_summaries)
final_summary = summarize_text(combined, max_tokens=1000)
return final_summary
Advanced Techniques
Guided Summarization
Instead of generic summaries, guide Claude to extract specific information:
def guided_summarize(text, instructions):
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1000,
system="You are a precise document analyst.",
messages=[
{
"role": "user",
"content": f"""Analyze this document and provide:
{instructions}
Document:
{text}"""
}
]
)
return response.content[0].text
Example for legal documents
instructions = """
- Parties involved
- Effective date and duration
- Key obligations of each party
- Termination conditions
- Financial terms (amounts, payment schedules)
- Any unusual clauses or red flags
"""
legal_summary = guided_summarize(clean_text, instructions)
Domain-Specific Guided Summarization
For legal documents, create a specialized prompt:
def legal_document_summarize(text):
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1500,
system="You are an expert legal document analyst. Focus on precision and legal accuracy.",
messages=[
{
"role": "user",
"content": f"""Summarize this legal document with specific attention to:
- Contract type and governing law
- All parties and their roles
- Key dates and deadlines
- Financial obligations and payment terms
- Liability and indemnification clauses
- Termination and renewal provisions
- Any unusual or potentially problematic clauses
Document:
{text}"""
}
]
)
return response.content[0].text
Meta-Summarization: Including Document Context
For even better results, include metadata about the document itself:
def meta_summarize(text, metadata):
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1000,
messages=[
{
"role": "user",
"content": f"""Document Metadata:
- Title: {metadata.get('title', 'Unknown')}
- Author: {metadata.get('author', 'Unknown')}
- Date: {metadata.get('date', 'Unknown')}
- Document Type: {metadata.get('type', 'Unknown')}
- Page Count: {metadata.get('pages', 'Unknown')}
Please provide a comprehensive summary of this document, noting how the metadata context influences the interpretation:
{text}"""
}
]
)
return response.content[0].text
Summary Indexed Documents: An Advanced RAG Approach
For large document collections, combine summarization with RAG:
- Chunk and summarize each section of every document
- Index the summaries in a vector database
- Retrieve relevant summaries based on user queries
- Generate final answer using retrieved context
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
def create_summary_index(documents):
"""Create a searchable index of document summaries"""
summaries = []
for doc in documents:
summary = summarize_text(doc, max_tokens=200)
summaries.append(summary)
vectorizer = TfidfVectorizer()
summary_vectors = vectorizer.fit_transform(summaries)
return summaries, vectorizer, summary_vectors
def rag_summarize(query, summaries, vectorizer, summary_vectors, top_k=3):
"""Retrieve and summarize relevant content"""
query_vector = vectorizer.transform([query])
similarities = cosine_similarity(query_vector, summary_vectors)[0]
top_indices = np.argsort(similarities)[-top_k:][::-1]
context = "\n\n".join([summaries[i] for i in top_indices])
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=500,
messages=[
{
"role": "user",
"content": f"Based on these document summaries, answer: {query}\n\nContext:\n{context}"
}
]
)
return response.content[0].text
Best Practices for Summarization RAG
- Chunk size matters: 2000-5000 tokens per chunk works well
- Overlap chunks: 10-20% overlap prevents information loss at boundaries
- Hierarchical summaries: Summarize chunks, then sections, then entire documents
- Metadata preservation: Always tag summaries with source document, page, and section
Evaluating Summary Quality
Automated evaluation is crucial for iteration. Here's how to use ROUGE scores:
from rouge_score import rouge_scorer
def evaluate_summary(reference, generated):
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
scores = scorer.score(reference, generated)
print("ROUGE-1:", scores['rouge1'].fmeasure)
print("ROUGE-2:", scores['rouge2'].fmeasure)
print("ROUGE-L:", scores['rougeL'].fmeasure)
return scores
For more nuanced evaluation, use Promptfoo for custom metrics:
npx promptfoo eval --config promptfooconfig.yaml
Create a config file that tests multiple prompts against your reference summaries.
Iterative Improvement
- Baseline: Start with basic summarization
- Evaluate: Use ROUGE and human review
- Identify gaps: Is it missing key information? Too verbose? Inaccurate?
- Refine prompts: Add specific instructions for problem areas
- Re-evaluate: Compare scores and iterate
def iterative_improve(text, reference_summary, iterations=3):
prompt_template = "Summarize this document concisely."
best_score = 0
best_summary = None
for i in range(iterations):
# Add feedback from previous iteration
if i > 0:
prompt_template += " Ensure all key dates and amounts are included."
summary = summarize_with_prompt(text, prompt_template)
scores = evaluate_summary(reference_summary, summary)
avg_score = (scores['rouge1'].fmeasure + scores['rouge2'].fmeasure + scores['rougeL'].fmeasure) / 3
if avg_score > best_score:
best_score = avg_score
best_summary = summary
return best_summary
Conclusion and Best Practices
- Start simple, iterate fast: Basic summarization works surprisingly well. Add complexity only when needed.
- Use guided prompts: Tell Claude exactly what information you need.
- Handle long documents with chunking: Multi-shot summarization prevents information loss.
- Evaluate systematically: Combine ROUGE scores with human review.
- Leverage RAG for document collections: Index summaries for fast, relevant retrieval.
- Domain-specific prompts matter: Legal, medical, and technical documents each need tailored instructions.
Key Takeaways
- Claude excels at summarization when given clear, structured prompts—guide it with specific instructions rather than vague requests
- For long documents, use multi-shot summarization with chunking and hierarchical summarization to maintain context and accuracy
- Evaluate systematically using ROUGE scores and tools like Promptfoo, but always complement with human review for nuanced quality assessment
- Advanced RAG approaches with summary-indexed documents enable powerful query-based retrieval across large document collections
- Iterative improvement through prompt refinement and evaluation cycles consistently yields better summaries than one-shot attempts