GuideBeginner2026-05-06

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn how to summarize complex documents using Claude AI. Covers prompt engineering, metadata extraction, handling long texts, ROUGE evaluation, and advanced RAG techniques.

Quick Answer

This guide teaches you to build effective summarization workflows with Claude, including crafting prompts, handling long documents via chunking, extracting metadata, and evaluating summary quality using ROUGE scores and custom methods.

Claude SummarizationPrompt EngineeringRAGDocument AnalysisEvaluation Metrics

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Summarization is one of the most powerful—and most requested—capabilities in the Claude AI ecosystem. Whether you're a legal professional drowning in contracts, a researcher scanning dozens of papers, or a product manager synthesizing customer feedback, the ability to condense lengthy documents into concise, actionable summaries is invaluable.

In this guide, we'll walk through a complete summarization workflow using Claude, starting with basic prompts and progressing to advanced techniques like guided summarization, meta-summarization, and Retrieval-Augmented Generation (RAG) for indexed documents. We'll also cover how to evaluate and iteratively improve your summaries.

Why Summarization is Hard (and Why Claude Excels)

Summarization is notoriously difficult to evaluate. Unlike classification or extraction tasks, there's rarely a single "correct" summary. Different readers value different things: a lawyer needs precise legal language preserved, while an executive wants high-level business impact. Claude's ability to follow nuanced instructions and adapt its output to specific contexts makes it ideal for this challenge.

Setting Up Your Environment

Before diving in, let's get your environment ready. You'll need:

An Anthropic API key
Python 3.8+
The following packages:

pip install anthropic pypdf pandas matplotlib scikit-learn numpy rouge-score nltk seaborn promptfoo

Initialize your Claude client:

import anthropic
client = anthropic.Anthropic(api_key="your-api-key")

Data Preparation: From PDF to Clean Text

Most real-world documents come as PDFs. Here's how to extract and clean text for Claude:

import pypdf
def extract_text_from_pdf(pdf_path):
    reader = pypdf.PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text
def clean_text(text):
    # Remove excessive whitespace
    import re
    text = re.sub(r'\s+', ' ', text)
    return text.strip()
Example usage
raw_text = extract_text_from_pdf("sublease_agreement.pdf")
clean_text = clean_text(raw_text)

For testing, you can also just define a string directly:

text = "Your document text here..."

Basic Summarization with Claude

Let's start simple. Here's a basic summarization function:

def summarize_text(text, max_tokens=500):
    response = client.messages.create(
        model="claude-3-opus-20240229",
        max_tokens=max_tokens,
        system="You are an expert summarizer. Provide concise, accurate summaries.",
        messages=[
            {
                "role": "user",
                "content": f"Please summarize the following document:\n\n{text}"
            }
        ]
    )
    return response.content[0].text
summary = summarize_text(clean_text)
print(summary)

This works, but it's limited. Claude's context window handles up to 200K tokens, but for truly long documents, you'll need chunking strategies.

Multi-Shot Summarization for Long Documents

When a document exceeds Claude's context window, break it into chunks, summarize each, then summarize the summaries:

def chunk_text(text, chunk_size=50000):
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunk = ' '.join(words[i:i+chunk_size])
        chunks.append(chunk)
    return chunks
def multi_shot_summarize(text):
    chunks = chunk_text(text)
    chunk_summaries = []
    
    for i, chunk in enumerate(chunks):
        summary = summarize_text(chunk, max_tokens=300)
        chunk_summaries.append(summary)
        print(f"Chunk {i+1}/{len(chunks)} summarized")
    
    # Summarize the summaries
    combined = "\n\n".join(chunk_summaries)
    final_summary = summarize_text(combined, max_tokens=1000)
    return final_summary

Advanced Techniques

Guided Summarization

Instead of generic summaries, guide Claude to extract specific information:

def guided_summarize(text, instructions):
    response = client.messages.create(
        model="claude-3-opus-20240229",
        max_tokens=1000,
        system="You are a precise document analyst.",
        messages=[
            {
                "role": "user",
                "content": f"""Analyze this document and provide:
{instructions}
Document:
{text}"""
            }
        ]
    )
    return response.content[0].text
Example for legal documents
instructions = """
Parties involved
Effective date and duration
Key obligations of each party
Termination conditions
Financial terms (amounts, payment schedules)
Any unusual clauses or red flags
"""
legal_summary = guided_summarize(clean_text, instructions)

Domain-Specific Guided Summarization

For legal documents, create a specialized prompt:

def legal_document_summarize(text):
    response = client.messages.create(
        model="claude-3-opus-20240229",
        max_tokens=1500,
        system="You are an expert legal document analyst. Focus on precision and legal accuracy.",
        messages=[
            {
                "role": "user",
                "content": f"""Summarize this legal document with specific attention to:
Contract type and governing law
All parties and their roles
Key dates and deadlines
Financial obligations and payment terms
Liability and indemnification clauses
Termination and renewal provisions
Any unusual or potentially problematic clauses

Document:
{text}"""
            }
        ]
    )
    return response.content[0].text

Meta-Summarization: Including Document Context

For even better results, include metadata about the document itself:

def meta_summarize(text, metadata):
    response = client.messages.create(
        model="claude-3-opus-20240229",
        max_tokens=1000,
        messages=[
            {
                "role": "user",
                "content": f"""Document Metadata:
Title: {metadata.get('title', 'Unknown')}
Author: {metadata.get('author', 'Unknown')}
Date: {metadata.get('date', 'Unknown')}
Document Type: {metadata.get('type', 'Unknown')}
Page Count: {metadata.get('pages', 'Unknown')}

Please provide a comprehensive summary of this document, noting how the metadata context influences the interpretation:
{text}"""
            }
        ]
    )
    return response.content[0].text

Summary Indexed Documents: An Advanced RAG Approach

For large document collections, combine summarization with RAG:

Chunk and summarize each section of every document
Index the summaries in a vector database
Retrieve relevant summaries based on user queries
Generate final answer using retrieved context

import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
def create_summary_index(documents):
    """Create a searchable index of document summaries"""
    summaries = []
    for doc in documents:
        summary = summarize_text(doc, max_tokens=200)
        summaries.append(summary)
    
    vectorizer = TfidfVectorizer()
    summary_vectors = vectorizer.fit_transform(summaries)
    return summaries, vectorizer, summary_vectors
def rag_summarize(query, summaries, vectorizer, summary_vectors, top_k=3):
    """Retrieve and summarize relevant content"""
    query_vector = vectorizer.transform([query])
    similarities = cosine_similarity(query_vector, summary_vectors)[0]
    top_indices = np.argsort(similarities)[-top_k:][::-1]
    
    context = "\n\n".join([summaries[i] for i in top_indices])
    
    response = client.messages.create(
        model="claude-3-opus-20240229",
        max_tokens=500,
        messages=[
            {
                "role": "user",
                "content": f"Based on these document summaries, answer: {query}\n\nContext:\n{context}"
            }
        ]
    )
    return response.content[0].text

Best Practices for Summarization RAG

Chunk size matters: 2000-5000 tokens per chunk works well
Overlap chunks: 10-20% overlap prevents information loss at boundaries
Hierarchical summaries: Summarize chunks, then sections, then entire documents
Metadata preservation: Always tag summaries with source document, page, and section

Evaluating Summary Quality

Automated evaluation is crucial for iteration. Here's how to use ROUGE scores:

from rouge_score import rouge_scorer
def evaluate_summary(reference, generated):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(reference, generated)
    
    print("ROUGE-1:", scores['rouge1'].fmeasure)
    print("ROUGE-2:", scores['rouge2'].fmeasure)
    print("ROUGE-L:", scores['rougeL'].fmeasure)
    return scores

For more nuanced evaluation, use Promptfoo for custom metrics:

npx promptfoo eval --config promptfooconfig.yaml

Create a config file that tests multiple prompts against your reference summaries.

Iterative Improvement

Baseline: Start with basic summarization
Evaluate: Use ROUGE and human review
Identify gaps: Is it missing key information? Too verbose? Inaccurate?
Refine prompts: Add specific instructions for problem areas
Re-evaluate: Compare scores and iterate

Example iteration cycle:

def iterative_improve(text, reference_summary, iterations=3):
    prompt_template = "Summarize this document concisely."
    best_score = 0
    best_summary = None
    
    for i in range(iterations):
        # Add feedback from previous iteration
        if i > 0:
            prompt_template += " Ensure all key dates and amounts are included."
        
        summary = summarize_with_prompt(text, prompt_template)
        scores = evaluate_summary(reference_summary, summary)
        avg_score = (scores['rouge1'].fmeasure + scores['rouge2'].fmeasure + scores['rougeL'].fmeasure) / 3
        
        if avg_score > best_score:
            best_score = avg_score
            best_summary = summary
    
    return best_summary

Conclusion and Best Practices

Start simple, iterate fast: Basic summarization works surprisingly well. Add complexity only when needed.
Use guided prompts: Tell Claude exactly what information you need.
Handle long documents with chunking: Multi-shot summarization prevents information loss.
Evaluate systematically: Combine ROUGE scores with human review.
Leverage RAG for document collections: Index summaries for fast, relevant retrieval.
Domain-specific prompts matter: Legal, medical, and technical documents each need tailored instructions.

Key Takeaways

Claude excels at summarization when given clear, structured prompts—guide it with specific instructions rather than vague requests
For long documents, use multi-shot summarization with chunking and hierarchical summarization to maintain context and accuracy
Evaluate systematically using ROUGE scores and tools like Promptfoo, but always complement with human review for nuanced quality assessment
Advanced RAG approaches with summary-indexed documents enable powerful query-based retrieval across large document collections
Iterative improvement through prompt refinement and evaluation cycles consistently yields better summaries than one-shot attempts