Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Learn how to summarize legal documents and long texts with Claude. Covers prompt engineering, metadata extraction, handling token limits, ROUGE evaluation, and iterative improvement.
This guide teaches you to summarize long documents with Claude using basic prompts, guided extraction, meta-summarization, and RAG. You'll also learn to evaluate summary quality with ROUGE scores and Promptfoo.
Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG
Summarization is one of the most powerful and practical applications of large language models. Whether you're a lawyer reviewing contracts, a researcher scanning papers, or a business analyst processing reports, the ability to condense lengthy documents into clear, actionable summaries saves time and improves decision-making.
This guide walks you through the full spectrum of summarization techniques using Claude, from a simple one-shot prompt to advanced Retrieval-Augmented Generation (RAG) with summary-indexed documents. We'll use a real-world legal document—a sublease agreement from the SEC—as our running example, because legal texts are notoriously dense and benefit enormously from intelligent summarization.
By the end, you'll have a practical toolkit you can apply immediately to your own documents, along with methods to evaluate and iteratively improve your summaries.
Why Summarization Is Hard (and Why Claude Excels)
Summarization evaluation is famously subjective. Unlike classification or translation, there's rarely a single "correct" summary. Different readers want different levels of detail, emphasis, and tone. Traditional metrics like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measure n-gram overlap with reference summaries, but they miss coherence, factual accuracy, and relevance.
Claude's strength lies in its ability to follow nuanced instructions, handle long contexts, and produce structured outputs. Combined with careful prompt engineering, you can tailor summaries to specific audiences and use cases.
Setting Up Your Environment
First, install the required packages:
pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo
You'll also need a Claude API key. Set it as an environment variable:
export ANTHROPIC_API_KEY="sk-ant-..."
Data Preparation: Extracting Text from PDFs
Before summarizing, you need clean text. Here's a Python function to extract text from a PDF:
import pypdf
def extract_text_from_pdf(pdf_path):
reader = pypdf.PdfReader(pdf_path)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
Example: extract from a sublease agreement
text = extract_text_from_pdf("sublease_agreement.pdf")
If you don't have a PDF, you can just define text = "your long document here...".
Basic Summarization: The Foundation
Let's start with a simple summarization function. Even this basic approach uses important Claude features: the assistant role and a stop sequence to control output.
import anthropic
client = anthropic.Anthropic()
def summarize_basic(text, max_tokens=500):
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=max_tokens,
system="You are an expert legal summarizer. Summarize the following document concisely, capturing all key terms, obligations, and dates.",
messages=[
{"role": "user", "content": f"Please summarize this document:\n\n{text}"}
]
)
return response.content[0].text
summary = summarize_basic(text)
print(summary)
This works, but it's limited. The summary may miss critical details or include irrelevant ones. Let's improve it.
Multi-Shot Basic Summarization
Instead of a single prompt, you can use a multi-shot approach where you ask Claude to produce multiple summaries and then combine them. This is especially useful for very long documents.
def summarize_multishot(text, chunk_size=3000):
# Split text into chunks
chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
# Summarize each chunk
chunk_summaries = []
for chunk in chunks:
summary = summarize_basic(chunk, max_tokens=200)
chunk_summaries.append(summary)
# Combine chunk summaries into a final summary
combined = "\n\n".join(chunk_summaries)
final_summary = summarize_basic(combined, max_tokens=500)
return final_summary
This technique helps when the document exceeds Claude's context window, but it can lose cross-chunk context.
Advanced Techniques
Guided Summarization
Instead of a generic "summarize this," guide Claude with a structured prompt. This yields more consistent, useful results.
def summarize_guided(text):
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=800,
system="You are a legal document analyst. Extract the following information in a structured format.",
messages=[
{"role": "user", "content": f"""
Please analyze this sublease agreement and provide:
- PARTIES: Who are the parties involved?
- KEY DATES: Start date, end date, renewal options
- FINANCIAL TERMS: Rent amount, payment schedule, security deposit
- OBLIGATIONS: Key responsibilities of each party
- TERMINATION: Conditions for early termination
- RISKS: Any unusual or high-risk clauses
Document:
{text}
"""}
]
)
return response.content[0].text
Domain-Specific Guided Summarization
For legal documents, you can go even deeper. Add domain-specific instructions:
def summarize_legal(text):
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1000,
system="You are a senior contract attorney. Analyze this document with attention to liability, indemnification, and jurisdictional clauses.",
messages=[
{"role": "user", "content": f"""
Provide a legal summary covering:
- Governing law and jurisdiction
- Indemnification and hold harmless clauses
- Limitation of liability
- Dispute resolution (arbitration vs. litigation)
- Force majeure
- Assignment and subletting restrictions
- Default and remedies
Document:
{text}
"""}
]
)
return response.content[0].text
Meta-Summarization: Including the Context of the Entire Document
When a document is too long for a single prompt, you can use a hierarchical approach:
- Split the document into sections.
- Summarize each section.
- Summarize the section summaries.
def meta_summarize(text, section_size=4000):
# Step 1: Split into sections
sections = [text[i:i+section_size] for i in range(0, len(text), section_size)]
# Step 2: Summarize each section
section_summaries = []
for i, section in enumerate(sections):
prompt = f"Summarize section {i+1} of {len(sections)} of this legal document. Focus on key terms and obligations."
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=300,
messages=[{"role": "user", "content": f"{prompt}\n\n{section}"}]
)
section_summaries.append(response.content[0].text)
# Step 3: Summarize the summaries
combined = "\n\n".join([f"Section {i+1}: {s}" for i, s in enumerate(section_summaries)])
final_response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=600,
messages=[{"role": "user", "content": f"Combine these section summaries into a coherent overall summary of the document:\n\n{combined}"}]
)
return final_response.content[0].text
Summary Indexed Documents: An Advanced RAG Approach
For truly large document collections, combine summarization with Retrieval-Augmented Generation (RAG). The idea: pre-summarize each document, index the summaries, and then retrieve relevant summaries at query time.
# Pseudocode for summary-indexed RAG
document_summaries = {}
for doc_id, doc_text in document_collection.items():
document_summaries[doc_id] = meta_summarize(doc_text)
At query time:
1. Embed the user's question
2. Find the most relevant document summaries via cosine similarity
3. Feed the top-k summaries + original documents into Claude for final answer
Best Practices for Summarization RAG
- Chunk wisely: Overlap chunks by 10-20% to avoid cutting off context mid-sentence.
- Metadata matters: Include document title, date, and source in the summary for traceability.
- Hierarchical retrieval: First retrieve summaries, then drill into full documents only when needed.
- Update summaries: If documents change, regenerate summaries rather than patching.
Evaluating Summary Quality
You can't improve what you don't measure. Here are two practical evaluation methods:
ROUGE Scores
ROUGE measures n-gram overlap between your summary and a reference summary. While imperfect, it's a useful baseline.
from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
reference = "The sublease agreement between Party A and Party B..."
candidate = "Party A and Party B entered into a sublease..."
scores = scorer.score(reference, candidate)
print(scores)
Promptfoo Custom Evaluation
Promptfoo allows you to define custom evaluation criteria. For example, you can check that the summary includes all key dates, parties, and financial terms using regex or LLM-as-judge.# promptfoo config.yaml
prompts:
- "Summarize this legal document: {{document}}"
tests:
- vars:
document: "..."
assert:
- type: contains-all
value: ["party", "date", "rent"]
- type: llm-rubric
value: "Does the summary accurately capture all financial obligations?"
Iterative Improvement
Summarization is rarely perfect on the first try. Use this feedback loop:
- Generate a summary using your current prompt.
- Evaluate using ROUGE, Promptfoo, or manual review.
- Identify gaps: What did the summary miss? What did it include that's irrelevant?
- Refine the prompt: Add instructions for the missing elements, remove ambiguity.
- Repeat.
system="You are an expert legal summarizer. Always include: parties, dates, financial terms, obligations, and termination conditions."
Conclusion and Best Practices
Summarization with Claude is both an art and a science. Here are the key principles to keep in mind:
- Start simple, then iterate: A basic summary is better than none. Improve based on evaluation.
- Guide, don't just ask: Use structured prompts to extract exactly what you need.
- Handle long documents hierarchically: Meta-summarization and RAG scale to any document size.
- Evaluate systematically: Use ROUGE for baseline, but supplement with task-specific checks.
- Tailor to your domain: Legal, medical, and technical documents each need specialized prompts.
Key Takeaways
- Use guided prompts with structured fields (parties, dates, obligations) to get consistent, actionable summaries from Claude.
- For documents exceeding token limits, apply multi-shot or meta-summarization by chunking, summarizing each chunk, then summarizing the summaries.
- Combine summarization with RAG to build scalable document retrieval systems that return both summaries and source texts.
- Evaluate summaries with ROUGE scores for baseline quality, and use Promptfoo for custom, task-specific assertions.
- Iterate on your prompts based on evaluation results—small tweaks to system instructions can dramatically improve output quality.