GuideBeginnerBest Practices2026-05-15

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn to summarize legal documents with Claude API. Covers prompt engineering, metadata extraction, long-document handling, ROUGE evaluation, and iterative improvement techniques.

Quick Answer

This guide teaches you how to use Claude for document summarization, from basic prompts to advanced techniques like guided summarization, meta-summarization, and summary-indexed RAG. You'll learn to evaluate summaries using ROUGE scores and Promptfoo, and iteratively improve your results.

summarizationprompt-engineeringevaluationlegal-documentsRAG

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher sifting through papers, or a product manager reviewing customer feedback, the ability to condense lengthy documents into actionable insights saves time and improves decision-making.

Claude excels at summarization thanks to its large context window, nuanced language understanding, and strong instruction-following capabilities. In this guide, we'll walk through a complete workflow—from basic summarization to advanced techniques like guided summarization, meta-summarization, and summary-indexed RAG (Retrieval-Augmented Generation). We'll also cover evaluation methods so you can measure and improve your results.

Why Summarization Is Hard (and Why Claude Helps)

Evaluating summary quality is notoriously subjective. Different readers value different things: some want bullet-point brevity, others need narrative flow. Traditional metrics like ROUGE scores measure word overlap with a reference summary, but they miss coherence, factual accuracy, and relevance. Claude's ability to follow detailed instructions and handle long documents makes it ideal for this task, but you still need a thoughtful approach to prompts and evaluation.

Setting Up Your Environment

First, install the required packages:

pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo

You'll also need a Claude API key. Set it as an environment variable:

export ANTHROPIC_API_KEY="your-api-key-here"

Data Preparation: Extracting Text from PDFs

Legal documents often come as PDFs. Here's a Python function to extract clean text:

import pypdf
def extract_text_from_pdf(pdf_path):
    reader = pypdf.PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text
Example usage
document_text = extract_text_from_pdf("sublease_agreement.pdf")

If you're working with plain text, simply assign it to a variable:

document_text = "Your long document text here..."

Basic Summarization with Claude

Let's start with a simple summarization function:

import anthropic
client = anthropic.Anthropic()
def summarize_text(text, max_tokens=500):
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=max_tokens,
        messages=[
            {
                "role": "user",
                "content": f"Please provide a concise summary of the following document:\n\n{text}"
            }
        ]
    )
    return response.content[0].text
summary = summarize_text(document_text)
print(summary)

This works, but it's basic. Notice we're already using Claude's instruction-following ability by specifying "concise summary." As we progress, we'll add structure, constraints, and domain-specific guidance.

Multi-Shot Summarization: Handling Long Documents

When documents exceed Claude's context window (or your desired chunk size), you need to summarize in parts and then combine. This is called multi-shot summarization:

def chunk_text(text, chunk_size=4000):
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunk = " ".join(words[i:i+chunk_size])
        chunks.append(chunk)
    return chunks
def multi_shot_summarize(text, chunk_size=4000):
    chunks = chunk_text(text, chunk_size)
    chunk_summaries = []
    
    for i, chunk in enumerate(chunks):
        summary = summarize_text(chunk, max_tokens=300)
        chunk_summaries.append(summary)
        print(f"Chunk {i+1}/{len(chunks)} summarized")
    
    # Combine chunk summaries into a final summary
    combined = " ".join(chunk_summaries)
    final_summary = summarize_text(combined, max_tokens=500)
    return final_summary
final = multi_shot_summarize(document_text)
print(final)

Advanced Techniques

Guided Summarization

Instead of a generic summary, guide Claude to extract specific information. This is especially useful for legal documents:

def guided_summarize(text):
    prompt = f"""Please analyze the following legal document and provide:
Parties Involved: List all named parties and their roles.
Key Dates: Effective date, termination date, renewal dates.
Obligations: Key obligations for each party.
Financial Terms: Rent, deposits, fees, penalties.
Termination Conditions: How and when the agreement can be terminated.
Risk Factors: Any clauses that could pose legal or financial risk.

Document:
{text}
Format your response as a structured report with clear headings."""
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

Domain-Specific Guided Summarization

For legal documents, you can add domain-specific instructions:

def legal_summarize(text):
    prompt = f"""You are a legal document analyst. Summarize this agreement for a non-lawyer business stakeholder. Focus on:
Business Impact: What does this mean for the company?
Hidden Liabilities: Indemnification, limitation of liability, governing law.
Action Items: What must each party do and by when?
Red Flags: Unusual or aggressive clauses.

Use plain language. Avoid legalese.
Document:
{text}"""
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

Meta-Summarization: Including Document Context

Sometimes you need to summarize a document while preserving its structure and context. Meta-summarization creates a summary that references the original document's sections:

def meta_summarize(text):
    prompt = f"""Summarize the following document. For each major section, provide:
Section Title (from the original document)
Key Points (3-5 bullet points)
Page/Paragraph Reference (approximate location in the original)

Then provide an overall executive summary at the top.
Document:
{text}"""
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1500,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

Summary-Indexed Documents: An Advanced RAG Approach

For very large document collections, you can create a summary index—a searchable database of document summaries. This enables fast retrieval and question-answering across thousands of documents.

def create_summary_index(documents):
    index = []
    for doc_id, doc_text in enumerate(documents):
        summary = summarize_text(doc_text, max_tokens=200)
        index.append({
            "doc_id": doc_id,
            "summary": summary,
            "full_text": doc_text
        })
    return index
def query_summary_index(query, index, top_k=3):
    # Simple keyword matching (in production, use embeddings)
    scored = []
    for entry in index:
        score = sum(1 for word in query.lower().split() if word in entry["summary"].lower())
        scored.append((score, entry))
    scored.sort(reverse=True)
    return [entry for _, entry in scored[:top_k]]

Best Practices for Summarization RAG

Chunk strategically: Split documents at natural boundaries (sections, paragraphs) rather than arbitrary token counts.
Store metadata: Include document title, date, author, and source URL alongside each summary.
Use embeddings: For production, use vector embeddings (e.g., from Claude or a dedicated embedding model) for semantic search.
Iterate on chunk size: Test different chunk sizes (500–2000 words) to find the sweet spot for your use case.

Evaluating Summary Quality

Evaluation is critical. Here's how to use ROUGE scores and Promptfoo:

ROUGE Score Example

from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
reference = "The sublease agreement transfers rights from Company A to Company B."
hypothesis = "Company B receives rights under the sublease from Company A."
scores = scorer.score(reference, hypothesis)
print(scores)

Using Promptfoo for Custom Evaluation

Promptfoo allows you to define custom evaluation criteria. Create a configuration file:

# promptfooconfig.yaml
prompts:
  - "Summarize: {{document}}"
providers:
  - id: anthropic:claude-3-5-sonnet-20241022
tests:
  - vars:
      document: "Your test document here..."
    assert:
      - type: contains
        value: "key term"
      - type: python
        value: "len(output.split()) < 200"

Run evaluation:

npx promptfoo eval

Iterative Improvement

Summarization is rarely perfect on the first try. Use this feedback loop:

Generate a summary with your current prompt.
Evaluate using automated metrics (ROUGE) and manual review.
Identify gaps: Is the summary missing key information? Too verbose? Inaccurate?
Refine the prompt: Add constraints (e.g., "max 100 words"), specify format (bullets vs. paragraphs), or add domain context.
Repeat until quality meets your threshold.

Example refinement:

# Version 1: Too verbose
prompt_v1 = "Summarize this document."
Version 2: Add structure and constraints
prompt_v2 = """Summarize this document in exactly 3 paragraphs:
Paragraph 1: What is the document about?
Paragraph 2: Key parties and their obligations
Paragraph 3: Important dates and financial terms

Keep each paragraph under 100 words."""

Conclusion and Best Practices

Summarization with Claude is both powerful and flexible. Here are the key takeaways:

Start simple, then add structure: Begin with a basic prompt and layer in guidance as needed.
Use domain-specific instructions: Legal, medical, or technical documents benefit from specialized prompts.
Handle long documents with chunking: Multi-shot summarization preserves quality across large texts.
Evaluate rigorously: Combine automated metrics (ROUGE) with custom evaluation tools (Promptfoo) and human review.
Iterate: Prompt engineering is an iterative process. Test, measure, and refine.

By applying these techniques, you can build robust summarization workflows that save time, reduce information overload, and deliver actionable insights.

Key Takeaways

Claude's instruction-following ability makes it ideal for structured summarization—you can guide it to extract specific metadata, risk factors, or action items from any document.
Multi-shot summarization (chunking + combining) enables handling of documents beyond the context window without losing coherence or key details.
Domain-specific prompts dramatically improve summary relevance—especially for legal, medical, or technical content where terminology matters.
Evaluation is essential and multi-faceted—use ROUGE for word-overlap metrics, Promptfoo for custom assertions, and always include human review for subjective quality.
Summary-indexed RAG unlocks search across large document collections—combine chunked summaries with vector embeddings for fast, semantic retrieval.