Guide2026-05-06

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn how to summarize complex documents using Claude AI. This guide covers prompt engineering, metadata extraction, handling long texts, ROUGE evaluation, and RAG-based summarization.

Quick Answer

This guide teaches you how to use Claude for effective document summarization, covering basic prompts, advanced techniques like guided and meta-summarization, handling long documents, and evaluating summary quality using ROUGE scores and Promptfoo.

Claude SummarizationPrompt EngineeringRAGLegal Document AnalysisEvaluation Metrics

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher sifting through papers, or a business analyst reviewing quarterly reports, the ability to distill lengthy documents into concise, actionable summaries is invaluable.

This guide is a practical walkthrough of how to use Claude for document summarization. We'll start with the basics and progressively build up to advanced techniques, including guided summarization, metadata extraction, handling documents beyond token limits, and even a Retrieval-Augmented Generation (RAG) approach. We'll also cover how to evaluate your summaries using both automated metrics and custom evaluation frameworks.

By the end, you'll have a complete toolkit for building robust summarization workflows with Claude.

Why Summarization is Hard (and Why Claude Excels)

Summarization is notoriously difficult to evaluate. Unlike classification or extraction tasks, there's rarely a single "correct" summary. Different readers value different things: a lawyer needs precise legal language, while a business executive wants the bottom line. Traditional metrics like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measure n-gram overlap with reference summaries but fail to capture coherence, factual accuracy, or relevance.

Claude excels here because of its strong instruction-following capabilities and large context window (up to 200K tokens). This allows you to:

Provide detailed instructions about what to include or exclude
Process entire documents in a single pass
Extract structured metadata alongside free-form summaries

Getting Started: Setup and Data Preparation

First, let's set up our environment. You'll need an Anthropic API key and a few Python packages.

# Install required packages
pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo

import anthropic
from pypdf import PdfReader
import pandas as pd
Initialize the Claude client
client = anthropic.Anthropic(api_key="YOUR_API_KEY")

Extracting Text from PDFs

For this guide, we'll use a publicly available legal document—a Sublease Agreement from the SEC's EDGAR database. Here's how to extract text from a PDF:

def extract_text_from_pdf(pdf_path):
    reader = PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text
Load your document
text = extract_text_from_pdf("sublease_agreement.pdf")
If you don't have a PDF, just use a text blob:
text = "Your document text here..."

Basic Summarization with Claude

Let's start with a simple summarization function. This is the foundation we'll build upon.

def summarize_text(text, max_tokens=500):
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=max_tokens,
        system="You are an expert document summarizer. Create a concise, accurate summary that captures the key points.",
        messages=[
            {"role": "user", "content": f"Please summarize the following document:\n\n{text}"}
        ]
    )
    return response.content[0].text
summary = summarize_text(text)
print(summary)

This works, but it's basic. The summary will be generic and may miss important details specific to your use case. Let's improve it.

Advanced Techniques for Better Summaries

1. Guided Summarization

Instead of a generic request, guide Claude with specific instructions. This is where prompt engineering shines.

def guided_summarize(text, focus_areas=None, output_format="paragraph"):
    if focus_areas is None:
        focus_areas = ["key terms", "obligations", "dates", "parties"]
    
    prompt = f"""Please summarize the following legal document. Focus specifically on:
{', '.join(focus_areas)}

Output format: {output_format}
Document:
{text}
"""
    
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=800,
        system="You are a legal document analyst. Provide precise, structured summaries.",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

2. Domain-Specific Guided Summarization

For legal documents, we can go further by extracting structured metadata alongside the summary.

def legal_document_summary(text):
    prompt = f"""Analyze this legal document and provide:
SUMMARY: A 3-4 sentence overview
PARTIES: List all named parties and their roles
KEY DATES: All important dates (effective date, termination, renewal, etc.)
OBLIGATIONS: Key obligations for each party
RISK FACTORS: Any unusual or potentially unfavorable terms
TERMINATION: Conditions for termination

Document:
{text}
"""
    
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=1000,
        system="You are a senior legal analyst. Extract all relevant information with precision.",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

3. Handling Long Documents with Meta-Summarization

What if your document exceeds Claude's context window? Use a chunk-and-summarize approach, then summarize the summaries.

def chunk_text(text, chunk_size=50000):
    """Split text into chunks of roughly equal size."""
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunk = " ".join(words[i:i+chunk_size])
        chunks.append(chunk)
    return chunks
def meta_summarize(text, max_tokens=1000):
    # Step 1: Chunk the document
    chunks = chunk_text(text)
    
    # Step 2: Summarize each chunk
    chunk_summaries = []
    for i, chunk in enumerate(chunks):
        summary = summarize_text(chunk, max_tokens=300)
        chunk_summaries.append(f"Section {i+1}: {summary}")
    
    # Step 3: Summarize the summaries
    combined = "\n\n".join(chunk_summaries)
    final_prompt = f"""Combine these section summaries into a coherent overall summary of the document. 
Ensure no key information is lost.
{combined}
"""
    
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=max_tokens,
        system="You are an expert at synthesizing information from multiple sources.",
        messages=[{"role": "user", "content": final_prompt}]
    )
    return response.content[0].text

Advanced RAG Approach: Summary-Indexed Documents

For very large document collections, consider a RAG approach where you index document summaries rather than raw text. This is more efficient and often produces better results.

# Pseudocode for Summary-Indexed RAG
class SummaryRAG:
    def __init__(self):
        self.summaries = []
        self.documents = []
    
    def add_document(self, doc_id, text):
        summary = summarize_text(text, max_tokens=200)
        self.summaries.append({"id": doc_id, "summary": summary})
        self.documents.append({"id": doc_id, "text": text})
    
    def query(self, question, top_k=3):
        # Find relevant summaries using embedding similarity
        relevant_summaries = self.search_summaries(question, top_k)
        
        # Retrieve full text for relevant documents
        context = "\n\n".join([
            self.get_document(s["id"]) for s in relevant_summaries
        ])
        
        # Generate answer using Claude
        return self.generate_answer(question, context)

Best Practices for Summarization RAG

Summarize at multiple granularities: Create both short (1-2 sentence) and detailed (paragraph) summaries
Include metadata: Always tag summaries with document source, date, and type
Use hierarchical indexing: For very long documents, create section-level summaries
Validate summaries: Periodically check that summaries accurately represent source documents

Evaluating Summary Quality

Evaluation is critical. Here are three approaches:

1. ROUGE Scores

ROUGE measures n-gram overlap between generated and reference summaries.

from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
scores = scorer.score(reference_summary, generated_summary)
print(f"ROUGE-1: {scores['rouge1'].fmeasure:.3f}")
print(f"ROUGE-2: {scores['rouge2'].fmeasure:.3f}")
print(f"ROUGE-L: {scores['rougeL'].fmeasure:.3f}")

2. Custom Evaluation with Promptfoo

Promptfoo allows you to define custom evaluation criteria. For example:

# promptfoo config prompts: - "Summarize this legal document: {{document}}"

tests: - vars: document: "file://sublease_agreement.pdf" assert: - type: contains-all value: - "effective date" - "termination" - "rent" - type: llm-rubric value: "Does the summary accurately capture all key obligations of both parties?"

3. Human Evaluation Checklist

For production systems, use a structured checklist:

✅ Factual accuracy (no hallucinations)
✅ Completeness (covers all key points)
✅ Conciseness (no unnecessary detail)
✅ Coherence (flows logically)
✅ Relevance (matches the intended use case)

Iterative Improvement Process

Generate baseline summaries using the basic approach
Evaluate using ROUGE and custom criteria
Identify weaknesses (e.g., missing dates, unclear obligations)
Refine prompts to address specific gaps
Re-evaluate and compare scores
Repeat until quality meets your threshold

Conclusion and Best Practices

Here are the key takeaways for building robust summarization systems with Claude:

Prompt engineering is everything: Be specific about what you want. Guide Claude with examples and structured output formats.
Use metadata extraction: Don't just summarize—extract structured data like dates, parties, and obligations.
Handle long documents strategically: Chunk and meta-summarize, or use a RAG approach with summary indexing.
Evaluate rigorously: Combine automated metrics (ROUGE) with custom evaluation and human review.
Iterate: Summarization is rarely perfect on the first try. Use evaluation results to refine your prompts and approach.

Key Takeaways

Start with guided prompts: Generic summaries are rarely useful. Provide specific instructions about what to include, exclude, and how to format the output.
Extract metadata alongside summaries: For legal or technical documents, structured extraction (dates, parties, obligations) adds enormous value beyond free-form summaries.
Use meta-summarization for long documents: Chunk the document, summarize each chunk, then summarize the summaries. This preserves information while staying within token limits.
Evaluate with multiple methods: Combine ROUGE scores, custom evaluation frameworks like Promptfoo, and human review to ensure summary quality.
Iterate based on evaluation: Use evaluation results to refine prompts and techniques. Small prompt changes can yield significant improvements.