Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Learn to summarize legal documents with Claude API. Covers prompt engineering, metadata extraction, long-document handling, ROUGE evaluation, and iterative improvement techniques.
This guide teaches you how to use Claude for document summarization, from basic prompts to advanced techniques like guided summarization, meta-summarization, and summary-indexed RAG. You'll learn to evaluate summaries using ROUGE scores and Promptfoo, and iteratively improve your results.
Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher sifting through papers, or a product manager reviewing customer feedback, the ability to condense lengthy documents into actionable insights saves time and improves decision-making.
Claude excels at summarization thanks to its large context window, nuanced language understanding, and strong instruction-following capabilities. In this guide, we'll walk through a complete workflow—from basic summarization to advanced techniques like guided summarization, meta-summarization, and summary-indexed RAG (Retrieval-Augmented Generation). We'll also cover evaluation methods so you can measure and improve your results.
Why Summarization Is Hard (and Why Claude Helps)
Evaluating summary quality is notoriously subjective. Different readers value different things: some want bullet-point brevity, others need narrative flow. Traditional metrics like ROUGE scores measure word overlap with a reference summary, but they miss coherence, factual accuracy, and relevance. Claude's ability to follow detailed instructions and handle long documents makes it ideal for this task, but you still need a thoughtful approach to prompts and evaluation.
Setting Up Your Environment
First, install the required packages:
pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo
You'll also need a Claude API key. Set it as an environment variable:
export ANTHROPIC_API_KEY="your-api-key-here"
Data Preparation: Extracting Text from PDFs
Legal documents often come as PDFs. Here's a Python function to extract clean text:
import pypdf
def extract_text_from_pdf(pdf_path):
reader = pypdf.PdfReader(pdf_path)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
Example usage
document_text = extract_text_from_pdf("sublease_agreement.pdf")
If you're working with plain text, simply assign it to a variable:
document_text = "Your long document text here..."
Basic Summarization with Claude
Let's start with a simple summarization function:
import anthropic
client = anthropic.Anthropic()
def summarize_text(text, max_tokens=500):
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=max_tokens,
messages=[
{
"role": "user",
"content": f"Please provide a concise summary of the following document:\n\n{text}"
}
]
)
return response.content[0].text
summary = summarize_text(document_text)
print(summary)
This works, but it's basic. Notice we're already using Claude's instruction-following ability by specifying "concise summary." As we progress, we'll add structure, constraints, and domain-specific guidance.
Multi-Shot Summarization: Handling Long Documents
When documents exceed Claude's context window (or your desired chunk size), you need to summarize in parts and then combine. This is called multi-shot summarization:
def chunk_text(text, chunk_size=4000):
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size):
chunk = " ".join(words[i:i+chunk_size])
chunks.append(chunk)
return chunks
def multi_shot_summarize(text, chunk_size=4000):
chunks = chunk_text(text, chunk_size)
chunk_summaries = []
for i, chunk in enumerate(chunks):
summary = summarize_text(chunk, max_tokens=300)
chunk_summaries.append(summary)
print(f"Chunk {i+1}/{len(chunks)} summarized")
# Combine chunk summaries into a final summary
combined = " ".join(chunk_summaries)
final_summary = summarize_text(combined, max_tokens=500)
return final_summary
final = multi_shot_summarize(document_text)
print(final)
Advanced Techniques
Guided Summarization
Instead of a generic summary, guide Claude to extract specific information. This is especially useful for legal documents:
def guided_summarize(text):
prompt = f"""Please analyze the following legal document and provide:
- Parties Involved: List all named parties and their roles.
- Key Dates: Effective date, termination date, renewal dates.
- Obligations: Key obligations for each party.
- Financial Terms: Rent, deposits, fees, penalties.
- Termination Conditions: How and when the agreement can be terminated.
- Risk Factors: Any clauses that could pose legal or financial risk.
Document:
{text}
Format your response as a structured report with clear headings."""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1000,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Domain-Specific Guided Summarization
For legal documents, you can add domain-specific instructions:
def legal_summarize(text):
prompt = f"""You are a legal document analyst. Summarize this agreement for a non-lawyer business stakeholder. Focus on:
- Business Impact: What does this mean for the company?
- Hidden Liabilities: Indemnification, limitation of liability, governing law.
- Action Items: What must each party do and by when?
- Red Flags: Unusual or aggressive clauses.
Use plain language. Avoid legalese.
Document:
{text}"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1000,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Meta-Summarization: Including Document Context
Sometimes you need to summarize a document while preserving its structure and context. Meta-summarization creates a summary that references the original document's sections:
def meta_summarize(text):
prompt = f"""Summarize the following document. For each major section, provide:
- Section Title (from the original document)
- Key Points (3-5 bullet points)
- Page/Paragraph Reference (approximate location in the original)
Then provide an overall executive summary at the top.
Document:
{text}"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1500,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Summary-Indexed Documents: An Advanced RAG Approach
For very large document collections, you can create a summary index—a searchable database of document summaries. This enables fast retrieval and question-answering across thousands of documents.
def create_summary_index(documents):
index = []
for doc_id, doc_text in enumerate(documents):
summary = summarize_text(doc_text, max_tokens=200)
index.append({
"doc_id": doc_id,
"summary": summary,
"full_text": doc_text
})
return index
def query_summary_index(query, index, top_k=3):
# Simple keyword matching (in production, use embeddings)
scored = []
for entry in index:
score = sum(1 for word in query.lower().split() if word in entry["summary"].lower())
scored.append((score, entry))
scored.sort(reverse=True)
return [entry for _, entry in scored[:top_k]]
Best Practices for Summarization RAG
- Chunk strategically: Split documents at natural boundaries (sections, paragraphs) rather than arbitrary token counts.
- Store metadata: Include document title, date, author, and source URL alongside each summary.
- Use embeddings: For production, use vector embeddings (e.g., from Claude or a dedicated embedding model) for semantic search.
- Iterate on chunk size: Test different chunk sizes (500–2000 words) to find the sweet spot for your use case.
Evaluating Summary Quality
Evaluation is critical. Here's how to use ROUGE scores and Promptfoo:
ROUGE Score Example
from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
reference = "The sublease agreement transfers rights from Company A to Company B."
hypothesis = "Company B receives rights under the sublease from Company A."
scores = scorer.score(reference, hypothesis)
print(scores)
Using Promptfoo for Custom Evaluation
Promptfoo allows you to define custom evaluation criteria. Create a configuration file:
# promptfooconfig.yaml
prompts:
- "Summarize: {{document}}"
providers:
- id: anthropic:claude-3-5-sonnet-20241022
tests:
- vars:
document: "Your test document here..."
assert:
- type: contains
value: "key term"
- type: python
value: "len(output.split()) < 200"
Run evaluation:
npx promptfoo eval
Iterative Improvement
Summarization is rarely perfect on the first try. Use this feedback loop:
- Generate a summary with your current prompt.
- Evaluate using automated metrics (ROUGE) and manual review.
- Identify gaps: Is the summary missing key information? Too verbose? Inaccurate?
- Refine the prompt: Add constraints (e.g., "max 100 words"), specify format (bullets vs. paragraphs), or add domain context.
- Repeat until quality meets your threshold.
# Version 1: Too verbose
prompt_v1 = "Summarize this document."
Version 2: Add structure and constraints
prompt_v2 = """Summarize this document in exactly 3 paragraphs:
- Paragraph 1: What is the document about?
- Paragraph 2: Key parties and their obligations
- Paragraph 3: Important dates and financial terms
Keep each paragraph under 100 words."""
Conclusion and Best Practices
Summarization with Claude is both powerful and flexible. Here are the key takeaways:
- Start simple, then add structure: Begin with a basic prompt and layer in guidance as needed.
- Use domain-specific instructions: Legal, medical, or technical documents benefit from specialized prompts.
- Handle long documents with chunking: Multi-shot summarization preserves quality across large texts.
- Evaluate rigorously: Combine automated metrics (ROUGE) with custom evaluation tools (Promptfoo) and human review.
- Iterate: Prompt engineering is an iterative process. Test, measure, and refine.
Key Takeaways
- Claude's instruction-following ability makes it ideal for structured summarization—you can guide it to extract specific metadata, risk factors, or action items from any document.
- Multi-shot summarization (chunking + combining) enables handling of documents beyond the context window without losing coherence or key details.
- Domain-specific prompts dramatically improve summary relevance—especially for legal, medical, or technical content where terminology matters.
- Evaluation is essential and multi-faceted—use ROUGE for word-overlap metrics, Promptfoo for custom assertions, and always include human review for subjective quality.
- Summary-indexed RAG unlocks search across large document collections—combine chunked summaries with vector embeddings for fast, semantic retrieval.