Guide2026-05-06

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn how to summarize complex documents using Claude AI. This practical guide covers prompt engineering, metadata extraction, handling long texts, ROUGE evaluation, and iterative improvement techniques.

Quick Answer

This guide teaches you how to use Claude for effective document summarization, from crafting basic prompts to advanced techniques like guided summarization, meta-summarization, and summary-indexed RAG. You'll also learn how to evaluate summary quality using ROUGE scores and Promptfoo.

Claude SummarizationPrompt EngineeringRAGDocument AnalysisEvaluation Metrics

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher sifting through papers, or a business analyst processing reports, Claude can help you extract the essence from lengthy documents in seconds.

In this guide, we'll walk through a complete workflow for summarizing documents using Claude. We'll start with basic techniques and progressively build up to advanced strategies like guided summarization, meta-summarization, and summary-indexed RAG (Retrieval-Augmented Generation). We'll also cover how to evaluate and iteratively improve your summaries.

Why Summarization with Claude?

Claude excels at summarization because of its large context window (up to 200K tokens) and its ability to understand nuance, tone, and domain-specific terminology. Unlike simple extractive methods, Claude can generate abstractive summaries that rephrase and synthesize information, making them more readable and actionable.

However, effective summarization isn't just about throwing text at the model. The quality of your output depends heavily on your prompt design, how you handle long documents, and how you evaluate results.

Getting Started: Setup and Data Preparation

Before we dive into summarization techniques, let's set up our environment. You'll need:

An Anthropic API key
Python 3.8+
The following packages: anthropic, pypdf, pandas, numpy, rouge-score, nltk, promptfoo

Installing Dependencies

pip install anthropic pypdf pandas numpy rouge-score nltk promptfoo

Preparing Your Document

For this guide, we'll use a publicly available Sublease Agreement from the SEC website. However, you can use any PDF or text document.

Here's a Python function to extract text from a PDF:

import pypdf
def extract_text_from_pdf(pdf_path):
    """Extract text from a PDF file."""
    text = ""
    with open(pdf_path, 'rb') as file:
        reader = pypdf.PdfReader(file)
        for page in reader.pages:
            text += page.extract_text()
    return text
Usage
document_text = extract_text_from_pdf("sublease_agreement.pdf")

If you're working with plain text, simply define:

document_text = "Your text content here..."

Basic Summarization

Let's start with a simple summarization function. Even this basic approach uses important Claude features like the assistant role and stop sequences.

import anthropic
client = anthropic.Anthropic(api_key="YOUR_API_KEY")
def basic_summarize(text, max_tokens=500):
    """Generate a basic summary of the provided text."""
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=max_tokens,
        system="You are an expert summarizer. Provide a concise, accurate summary of the following document.",
        messages=[
            {
                "role": "user",
                "content": f"Please summarize the following document:\n\n{text}"
            }
        ]
    )
    return response.content[0].text
summary = basic_summarize(document_text)
print(summary)

This works, but it's quite basic. The summary might miss important details or include irrelevant information. Let's improve it.

Multi-Shot Basic Summarization

A simple improvement is to use a multi-shot approach where we ask Claude to generate multiple summaries and then combine them. This can help capture different aspects of the document.

def multi_shot_summarize(text, num_shots=3):
    """Generate multiple summaries and combine them."""
    summaries = []
    for i in range(num_shots):
        response = client.messages.create(
            model="claude-3-sonnet-20241022",
            max_tokens=300,
            system="You are an expert summarizer. Focus on different aspects each time.",
            messages=[
                {
                    "role": "user",
                    "content": f"Shot {i+1}: Summarize this document, focusing on a different angle:\n\n{text}"
                }
            ]
        )
        summaries.append(response.content[0].text)
    
    # Combine summaries
    combined = "\n\n".join(summaries)
    final_response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=500,
        system="Synthesize the following summaries into one coherent summary.",
        messages=[
            {
                "role": "user",
                "content": f"Combine these summaries into one:\n\n{combined}"
            }
        ]
    )
    return final_response.content[0].text

Advanced Techniques

Guided Summarization

Instead of a generic "summarize this" prompt, guide Claude with specific instructions about what to include and how to structure the output.

def guided_summarize(text, aspects=None):
    """Generate a guided summary focusing on specific aspects."""
    if aspects is None:
        aspects = ["key parties", "main obligations", "dates and deadlines", "financial terms"]
    
    prompt = f"""Please summarize the following document with specific attention to:
{', '.join(aspects)}
Structure your summary as follows:
Executive Summary (2-3 sentences)
Key Parties Involved
Main Obligations
Important Dates and Deadlines
Financial Terms
Risks and Liabilities

Document:
{text}
"""
    
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=800,
        system="You are a legal document analyst. Provide structured, accurate summaries.",
        messages=[
            {
                "role": "user",
                "content": prompt
            }
        ]
    )
    return response.content[0].text

Domain-Specific Guided Summarization

For legal documents, you can add domain-specific instructions:

def legal_summarize(text):
    """Generate a legal-specific summary."""
    prompt = f"""You are a legal document summarizer. Analyze this contract and provide:
Contract Type and Parties: What type of agreement is this? Who are the parties?
Effective Date and Term: When does it start? How long does it last?
Key Obligations: What must each party do?
Payment Terms: What are the financial arrangements?
Termination Clauses: How can the agreement be ended?
Liability and Indemnification: Who is responsible for what?
Governing Law: Which jurisdiction applies?

Document:
{text}
"""
    
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=1000,
        system="You are an expert legal document analyst. Provide accurate, structured summaries.",
        messages=[
            {
                "role": "user",
                "content": prompt
            }
        ]
    )
    return response.content[0].text

Meta-Summarization: Handling Long Documents

When documents exceed Claude's context window, you can use a chunk-and-summarize approach:

def chunk_text(text, chunk_size=50000):
    """Split text into chunks of approximately chunk_size characters."""
    chunks = []
    current_chunk = ""
    for paragraph in text.split('\n\n'):
        if len(current_chunk) + len(paragraph) < chunk_size:
            current_chunk += paragraph + '\n\n'
        else:
            chunks.append(current_chunk.strip())
            current_chunk = paragraph + '\n\n'
    if current_chunk:
        chunks.append(current_chunk.strip())
    return chunks
def meta_summarize(text):
    """Summarize long documents using chunking and meta-summarization."""
    chunks = chunk_text(text)
    chunk_summaries = []
    
    for i, chunk in enumerate(chunks):
        summary = basic_summarize(chunk, max_tokens=300)
        chunk_summaries.append(f"Section {i+1}: {summary}")
    
    # Combine chunk summaries
    combined = "\n\n".join(chunk_summaries)
    
    # Final meta-summary
    final_response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=800,
        system="You are an expert synthesizer. Create a coherent summary from these section summaries.",
        messages=[
            {
                "role": "user",
                "content": f"Synthesize these section summaries into one coherent document summary:\n\n{combined}"
            }
        ]
    )
    return final_response.content[0].text

Summary-Indexed Documents: An Advanced RAG Approach

For even better results with long documents, you can create a summary-indexed RAG system. This involves:

Chunking the document
Generating summaries for each chunk
Using those summaries as an index for retrieval
Only retrieving relevant chunks for the final summary

from typing import List, Dict
import numpy as np
def create_summary_index(text: str, chunk_size: int = 50000) -> List[Dict]:
    """Create a summary index from document chunks."""
    chunks = chunk_text(text, chunk_size)
    index = []
    
    for i, chunk in enumerate(chunks):
        chunk_summary = basic_summarize(chunk, max_tokens=100)
        index.append({
            "chunk_id": i,
            "summary": chunk_summary,
            "text": chunk
        })
    
    return index
def retrieve_relevant_chunks(index: List[Dict], query: str, top_k: int = 3) -> List[Dict]:
    """Retrieve the most relevant chunks based on summary similarity."""
    # In a production system, you'd use embeddings here
    # For simplicity, we'll use keyword matching
    query_terms = set(query.lower().split())
    scores = []
    
    for entry in index:
        summary_terms = set(entry["summary"].lower().split())
        overlap = len(query_terms & summary_terms)
        scores.append(overlap)
    
    top_indices = np.argsort(scores)[-top_k:][::-1]
    return [index[i] for i in top_indices]
def rag_summarize(text: str, query: str = "full document summary") -> str:
    """Generate a summary using RAG approach."""
    index = create_summary_index(text)
    relevant_chunks = retrieve_relevant_chunks(index, query)
    
    combined_text = "\n\n".join([chunk["text"] for chunk in relevant_chunks])
    return guided_summarize(combined_text)

Best Practices for Summarization RAG

Chunk size matters: 50,000 characters (about 12,500 tokens) works well for most documents
Overlap chunks: Include 10-20% overlap to avoid cutting off important context
Use embeddings: For better retrieval, use sentence transformers or Claude embeddings
Cache summaries: Store chunk summaries to avoid regenerating them
Hierarchical indexing: For very long documents, create a hierarchy of summaries

Evaluating Summary Quality

Evaluation is crucial but challenging. Here are three approaches:

1. ROUGE Scores

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) compares your summary to a reference summary:

from rouge_score import rouge_scorer
def calculate_rouge(reference: str, generated: str) -> Dict:
    """Calculate ROUGE scores."""
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(reference, generated)
    return {
        'rouge1': scores['rouge1'].fmeasure,
        'rouge2': scores['rouge2'].fmeasure,
        'rougeL': scores['rougeL'].fmeasure
    }

2. Claude as an Evaluator

Use Claude itself to evaluate summary quality:

def evaluate_summary(original: str, summary: str) -> str:
    """Use Claude to evaluate summary quality."""
    prompt = f"""Evaluate this summary on a scale of 1-5 for:
Accuracy: Does it contain factual errors?
Completeness: Does it cover all key points?
Conciseness: Is it appropriately brief?
Coherence: Is it well-structured and readable?

Original document:
{original[:5000]}...
Summary:
{summary}
Provide scores and brief justifications."""
    
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

3. Promptfoo for Automated Evaluation

Promptfoo provides a framework for systematic evaluation:

npx promptfoo eval --config config.yaml

Iterative Improvement

Improving summarization is an iterative process:

Start simple: Use basic summarization
Evaluate: Identify weaknesses (missing details, factual errors, poor structure)
Refine prompts: Add specific instructions based on evaluation
Test edge cases: Try different document types and lengths
Automate evaluation: Set up continuous evaluation with Promptfoo

Conclusion and Best Practices

Here are the key takeaways for effective summarization with Claude:

Be specific in your prompts: Tell Claude exactly what you want, including structure and focus areas
Use guided summarization for complex documents: Structured prompts yield better results
Chunk long documents: Use meta-summarization for texts beyond the context window
Implement RAG for large-scale summarization: Summary-indexed retrieval improves relevance
Evaluate systematically: Combine automated metrics with human review
Iterate: Continuously refine your prompts based on evaluation results

Key Takeaways

Prompt specificity is crucial: The more guidance you give Claude about what to include and how to structure the output, the better your summaries will be
Handle long documents strategically: Use chunking and meta-summarization to process texts that exceed context windows
Domain-specific prompts improve accuracy: Tailoring prompts for legal, medical, or technical content yields more relevant summaries
Evaluation requires multiple approaches: Combine ROUGE scores, Claude-as-judge, and human review for comprehensive quality assessment
RAG enhances summarization: Summary-indexed retrieval allows you to focus on the most relevant portions of long documents