Guide2026-05-05

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn how to summarize long documents with Claude AI, including legal texts. Covers prompt engineering, metadata extraction, handling token limits, ROUGE evaluation, and iterative improvement.

Quick Answer

This guide teaches you to summarize documents with Claude using basic prompts, guided summarization, meta-summarization, and RAG. You'll learn to extract metadata, handle long documents, evaluate quality with ROUGE scores, and iteratively improve results.

summarizationprompt engineeringClaude APIRAGevaluation

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher sifting through papers, or a product manager reviewing customer feedback, the ability to distill lengthy documents into concise, accurate summaries saves time and unlocks insights.

This guide walks you through the complete workflow of document summarization using Claude—from a simple one-shot prompt to advanced techniques like guided summarization, meta-summarization, and Retrieval-Augmented Generation (RAG). We'll also cover how to evaluate and iteratively improve your summaries.

Why Claude for Summarization?

Claude excels at summarization because of its large context window (up to 200K tokens), nuanced understanding of language, and ability to follow complex instructions. It can handle entire books, legal contracts, or technical reports in a single pass, making it ideal for real-world summarization tasks.

Getting Started: Setup and Data Preparation

First, install the required packages:

pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo

You'll also need a Claude API key. Set it as an environment variable:

export ANTHROPIC_API_KEY="your-api-key-here"

Preparing Your Document

For this guide, we'll use a publicly available Sublease Agreement from the SEC's EDGAR system. You can also use any PDF or text blob.

import pypdf
def extract_text_from_pdf(pdf_path):
    reader = pypdf.PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text
document_text = extract_text_from_pdf("sublease_agreement.pdf")

If you're working with a plain text string, simply define:

document_text = "Your long text here..."

Basic Summarization with Claude

Let's start with a simple summarization function. This is the foundation we'll build upon.

import anthropic
client = anthropic.Anthropic()
def summarize_text(text, max_tokens=500):
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=max_tokens,
        system="You are an expert summarizer. Provide a concise summary of the key points.",
        messages=[
            {"role": "user", "content": f"Please summarize the following document:\n\n{text}"}
        ]
    )
    return response.content[0].text
summary = summarize_text(document_text)
print(summary)

This works, but it's basic. Notice we're already using Claude's system prompt and the assistant role effectively. As we progress, we'll add more structure.

Advanced Techniques

1. Guided Summarization

Instead of a generic summary, guide Claude to extract specific information. This is especially useful for legal documents where you need to find obligations, dates, and parties.

def guided_summarize(text):
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=800,
        system="You are a legal document analyst. Extract structured information.",
        messages=[
            {"role": "user", "content": f"""
Analyze this legal document and provide:
Parties involved
Effective date and term
Key obligations of each party
Termination conditions
Payment terms
Any unusual clauses

Document:
{text}
"""}
        ]
    )
    return response.content[0].text

2. Domain-Specific Guided Summarization

Tailor the prompt to your domain. For medical documents, ask for diagnoses and treatments. For financial reports, ask for revenue and risk factors.

def financial_summarize(text):
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=800,
        system="You are a financial analyst. Extract key financial metrics and risks.",
        messages=[
            {"role": "user", "content": f"""
Extract from this financial document:
Revenue and profit figures
Key growth drivers
Risk factors mentioned
Forward-looking statements
Management's outlook

Document:
{text}
"""}
        ]
    )
    return response.content[0].text

3. Meta-Summarization (Summarizing the Summary)

For extremely long documents, you can use a hierarchical approach: summarize sections, then summarize those summaries.

def chunk_and_summarize(text, chunk_size=5000):
    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
    chunk_summaries = []
    
    for chunk in chunks:
        summary = summarize_text(chunk, max_tokens=300)
        chunk_summaries.append(summary)
    
    # Now summarize the summaries
    combined_summaries = "\n\n".join(chunk_summaries)
    final_summary = summarize_text(combined_summaries, max_tokens=500)
    return final_summary

Summary Indexed Documents: An Advanced RAG Approach

For massive document collections, combine summarization with RAG. Instead of indexing raw chunks, index summaries of those chunks. This improves retrieval quality because summaries are denser and more relevant.

from sentence_transformers import SentenceTransformer
import numpy as np
Create summary index
chunks = chunk_text(document_text, chunk_size=2000)
chunk_summaries = [summarize_text(chunk, max_tokens=100) for chunk in chunks]
Embed summaries for retrieval
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(chunk_summaries)
def query_summary_index(query, top_k=3):
    query_embedding = model.encode([query])
    similarities = np.dot(embeddings, query_embedding.T).flatten()
    top_indices = np.argsort(similarities)[-top_k:][::-1]
    
    # Return original chunks corresponding to top summaries
    return [chunks[i] for i in top_indices]

Best Practices for Summarization RAG

Chunk strategically: Align chunks with natural document boundaries (sections, paragraphs).
Summarize before indexing: Summary vectors are more informative than raw text vectors.
Include metadata: Store document name, date, and section with each summary.
Use hybrid search: Combine semantic similarity with keyword matching for better recall.

Evaluating Summary Quality

Evaluation is the hardest part of summarization. Here are two practical methods:

1. ROUGE Scores

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) compares your summary to a reference summary.

from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
reference = "The sublease agreement is between Party A and Party B..."
candidate = "This agreement involves Party A and Party B..."
scores = scorer.score(reference, candidate)
print(f"ROUGE-1: {scores['rouge1'].fmeasure:.2f}")
print(f"ROUGE-2: {scores['rouge2'].fmeasure:.2f}")
print(f"ROUGE-L: {scores['rougeL'].fmeasure:.2f}")

2. Promptfoo for Automated Evaluation

Promptfoo allows you to define custom evaluation criteria and run them programmatically.

// promptfoo config example
{
  "prompts": ["Summarize this document: {{document}}"],
  "providers": ["anthropic:claude-3-sonnet-20240229"],
  "tests": [
    {
      "vars": {
        "document": "..."
      },
      "assert": [
        {
          "type": "llm-rubric",
          "value": "Does the summary include all key parties and dates?"
        },
        {
          "type": "contains-all",
          "value": ["Party A", "Party B", "effective date"]
        }
      ]
    }
  ]
}

Iterative Improvement

Summarization is rarely perfect on the first try. Use this feedback loop:

Generate a summary
Evaluate using ROUGE and/or human review
Identify gaps: Missing information? Too verbose? Factual errors?
Refine the prompt: Add instructions, examples, or constraints
Repeat

Example refinement cycle:

# Version 1: Too verbose
prompt_v1 = "Summarize this document."
Version 2: Add length constraint
prompt_v2 = "Summarize this document in 3-5 bullet points."
Version 3: Add structure and examples
prompt_v3 = """
Summarize this legal document in exactly 5 bullet points:
Parties involved
Key dates
Financial terms
Obligations
Termination conditions

Example output:
Parties: Acme Corp (Landlord) and Beta Inc (Tenant)
Dates: Effective Jan 1, 2024, Term 12 months
...
"""

Conclusion and Best Practices

Start simple, then iterate: Begin with a basic prompt and refine based on evaluation.
Use guided prompts for structured output: Especially for legal, financial, or medical documents.
Handle long documents with chunking and meta-summarization: Don't exceed Claude's context window.
Evaluate systematically: Combine automated metrics (ROUGE) with custom criteria (Promptfoo).
Consider RAG for large collections: Index summaries, not raw text, for better retrieval.

Summarization with Claude is both an art and a science. By combining thoughtful prompt engineering with robust evaluation, you can build summarization systems that deliver consistent, high-quality results.

Key Takeaways

Guided prompts outperform generic ones: Specify exactly what information you need (parties, dates, obligations) for structured, actionable summaries.
Chunking + meta-summarization handles any document length: Break long texts into sections, summarize each, then summarize the summaries.
Summary-indexed RAG improves retrieval: Indexing summaries instead of raw chunks yields denser, more relevant search results.
Evaluate with both ROUGE and custom criteria: ROUGE measures overlap; custom checks (via Promptfoo) catch factual accuracy and completeness.
Iterate relentlessly: Small prompt refinements—adding examples, constraints, or structure—dramatically improve summary quality.