Guide2026-04-24

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn how to summarize legal documents and long texts with Claude. Covers prompt engineering, metadata extraction, handling token limits, ROUGE evaluation, and iterative improvement.

Quick Answer

This guide teaches you to summarize long documents with Claude using basic prompts, guided extraction, meta-summarization, and RAG. You'll also learn to evaluate summary quality with ROUGE scores and Promptfoo.

summarizationprompt engineeringlegal documentsROUGE evaluationRAG

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Summarization is one of the most powerful and practical applications of large language models. Whether you're a lawyer reviewing contracts, a researcher scanning papers, or a business analyst processing reports, the ability to condense lengthy documents into clear, actionable summaries saves time and improves decision-making.

This guide walks you through the full spectrum of summarization techniques using Claude, from a simple one-shot prompt to advanced Retrieval-Augmented Generation (RAG) with summary-indexed documents. We'll use a real-world legal document—a sublease agreement from the SEC—as our running example, because legal texts are notoriously dense and benefit enormously from intelligent summarization.

By the end, you'll have a practical toolkit you can apply immediately to your own documents, along with methods to evaluate and iteratively improve your summaries.

Why Summarization Is Hard (and Why Claude Excels)

Summarization evaluation is famously subjective. Unlike classification or translation, there's rarely a single "correct" summary. Different readers want different levels of detail, emphasis, and tone. Traditional metrics like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measure n-gram overlap with reference summaries, but they miss coherence, factual accuracy, and relevance.

Claude's strength lies in its ability to follow nuanced instructions, handle long contexts, and produce structured outputs. Combined with careful prompt engineering, you can tailor summaries to specific audiences and use cases.

Setting Up Your Environment

First, install the required packages:

pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo

You'll also need a Claude API key. Set it as an environment variable:

export ANTHROPIC_API_KEY="sk-ant-..."

Data Preparation: Extracting Text from PDFs

Before summarizing, you need clean text. Here's a Python function to extract text from a PDF:

import pypdf
def extract_text_from_pdf(pdf_path):
    reader = pypdf.PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text
Example: extract from a sublease agreement
text = extract_text_from_pdf("sublease_agreement.pdf")

If you don't have a PDF, you can just define text = "your long document here...".

Basic Summarization: The Foundation

Let's start with a simple summarization function. Even this basic approach uses important Claude features: the assistant role and a stop sequence to control output.

import anthropic
client = anthropic.Anthropic()
def summarize_basic(text, max_tokens=500):
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=max_tokens,
        system="You are an expert legal summarizer. Summarize the following document concisely, capturing all key terms, obligations, and dates.",
        messages=[
            {"role": "user", "content": f"Please summarize this document:\n\n{text}"}
        ]
    )
    return response.content[0].text
summary = summarize_basic(text)
print(summary)

This works, but it's limited. The summary may miss critical details or include irrelevant ones. Let's improve it.

Multi-Shot Basic Summarization

Instead of a single prompt, you can use a multi-shot approach where you ask Claude to produce multiple summaries and then combine them. This is especially useful for very long documents.

def summarize_multishot(text, chunk_size=3000):
    # Split text into chunks
    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
    
    # Summarize each chunk
    chunk_summaries = []
    for chunk in chunks:
        summary = summarize_basic(chunk, max_tokens=200)
        chunk_summaries.append(summary)
    
    # Combine chunk summaries into a final summary
    combined = "\n\n".join(chunk_summaries)
    final_summary = summarize_basic(combined, max_tokens=500)
    return final_summary

This technique helps when the document exceeds Claude's context window, but it can lose cross-chunk context.

Advanced Techniques

Guided Summarization

Instead of a generic "summarize this," guide Claude with a structured prompt. This yields more consistent, useful results.

def summarize_guided(text):
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=800,
        system="You are a legal document analyst. Extract the following information in a structured format.",
        messages=[
            {"role": "user", "content": f"""
Please analyze this sublease agreement and provide:
PARTIES: Who are the parties involved?
KEY DATES: Start date, end date, renewal options
FINANCIAL TERMS: Rent amount, payment schedule, security deposit
OBLIGATIONS: Key responsibilities of each party
TERMINATION: Conditions for early termination
RISKS: Any unusual or high-risk clauses

Document:
{text}
"""}
        ]
    )
    return response.content[0].text

Domain-Specific Guided Summarization

For legal documents, you can go even deeper. Add domain-specific instructions:

def summarize_legal(text):
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1000,
        system="You are a senior contract attorney. Analyze this document with attention to liability, indemnification, and jurisdictional clauses.",
        messages=[
            {"role": "user", "content": f"""
Provide a legal summary covering:
Governing law and jurisdiction
Indemnification and hold harmless clauses
Limitation of liability
Dispute resolution (arbitration vs. litigation)
Force majeure
Assignment and subletting restrictions
Default and remedies

Document:
{text}
"""}
        ]
    )
    return response.content[0].text

Meta-Summarization: Including the Context of the Entire Document

When a document is too long for a single prompt, you can use a hierarchical approach:

Split the document into sections.
Summarize each section.
Summarize the section summaries.

This "meta-summarization" preserves the overall narrative while staying within token limits.

def meta_summarize(text, section_size=4000):
    # Step 1: Split into sections
    sections = [text[i:i+section_size] for i in range(0, len(text), section_size)]
    
    # Step 2: Summarize each section
    section_summaries = []
    for i, section in enumerate(sections):
        prompt = f"Summarize section {i+1} of {len(sections)} of this legal document. Focus on key terms and obligations."
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=300,
            messages=[{"role": "user", "content": f"{prompt}\n\n{section}"}]
        )
        section_summaries.append(response.content[0].text)
    
    # Step 3: Summarize the summaries
    combined = "\n\n".join([f"Section {i+1}: {s}" for i, s in enumerate(section_summaries)])
    final_response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=600,
        messages=[{"role": "user", "content": f"Combine these section summaries into a coherent overall summary of the document:\n\n{combined}"}]
    )
    return final_response.content[0].text

Summary Indexed Documents: An Advanced RAG Approach

For truly large document collections, combine summarization with Retrieval-Augmented Generation (RAG). The idea: pre-summarize each document, index the summaries, and then retrieve relevant summaries at query time.

# Pseudocode for summary-indexed RAG
document_summaries = {}
for doc_id, doc_text in document_collection.items():
    document_summaries[doc_id] = meta_summarize(doc_text)
At query time:
1. Embed the user's question
2. Find the most relevant document summaries via cosine similarity
3. Feed the top-k summaries + original documents into Claude for final answer

Best Practices for Summarization RAG

Chunk wisely: Overlap chunks by 10-20% to avoid cutting off context mid-sentence.
Metadata matters: Include document title, date, and source in the summary for traceability.
Hierarchical retrieval: First retrieve summaries, then drill into full documents only when needed.
Update summaries: If documents change, regenerate summaries rather than patching.

Evaluating Summary Quality

You can't improve what you don't measure. Here are two practical evaluation methods:

ROUGE Scores

ROUGE measures n-gram overlap between your summary and a reference summary. While imperfect, it's a useful baseline.

from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
reference = "The sublease agreement between Party A and Party B..."
candidate = "Party A and Party B entered into a sublease..."
scores = scorer.score(reference, candidate)
print(scores)

Promptfoo Custom Evaluation

Promptfoo allows you to define custom evaluation criteria. For example, you can check that the summary includes all key dates, parties, and financial terms using regex or LLM-as-judge.

# promptfoo config.yaml prompts: - "Summarize this legal document: {{document}}"

tests: - vars: document: "..." assert: - type: contains-all value: ["party", "date", "rent"] - type: llm-rubric value: "Does the summary accurately capture all financial obligations?"

Iterative Improvement

Summarization is rarely perfect on the first try. Use this feedback loop:

Generate a summary using your current prompt.
Evaluate using ROUGE, Promptfoo, or manual review.
Identify gaps: What did the summary miss? What did it include that's irrelevant?
Refine the prompt: Add instructions for the missing elements, remove ambiguity.
Repeat.

For example, if your summaries consistently miss termination clauses, update your system prompt:

system="You are an expert legal summarizer. Always include: parties, dates, financial terms, obligations, and termination conditions."

Conclusion and Best Practices

Summarization with Claude is both an art and a science. Here are the key principles to keep in mind:

Start simple, then iterate: A basic summary is better than none. Improve based on evaluation.
Guide, don't just ask: Use structured prompts to extract exactly what you need.
Handle long documents hierarchically: Meta-summarization and RAG scale to any document size.
Evaluate systematically: Use ROUGE for baseline, but supplement with task-specific checks.
Tailor to your domain: Legal, medical, and technical documents each need specialized prompts.

Key Takeaways

Use guided prompts with structured fields (parties, dates, obligations) to get consistent, actionable summaries from Claude.
For documents exceeding token limits, apply multi-shot or meta-summarization by chunking, summarizing each chunk, then summarizing the summaries.
Combine summarization with RAG to build scalable document retrieval systems that return both summaries and source texts.
Evaluate summaries with ROUGE scores for baseline quality, and use Promptfoo for custom, task-specific assertions.
Iterate on your prompts based on evaluation results—small tweaks to system instructions can dramatically improve output quality.