Guide2026-05-06

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn practical techniques for summarizing long documents with Claude AI, including prompt engineering, metadata extraction, handling token limits, and evaluating summary quality using ROUGE scores and Promptfoo.

Quick Answer

Learn how to use Claude for document summarization, from basic prompts to advanced techniques like guided summarization, meta-summarization, and RAG-based indexing. Includes practical code examples and evaluation methods.

Claude SummarizationPrompt EngineeringRAGLegal Document AnalysisEvaluation Metrics

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher sifting through papers, or a product manager parsing customer feedback, the ability to condense lengthy documents into actionable insights saves time and improves decision-making.

Claude excels at summarization tasks thanks to its large context window, nuanced understanding of language, and ability to follow complex instructions. In this guide, you'll learn practical, actionable techniques for summarizing documents with Claude—from simple prompts to advanced retrieval-augmented generation (RAG) approaches.

We'll focus on real-world challenges: handling long documents, extracting structured metadata, evaluating summary quality, and iteratively improving your results. Code examples are provided in Python, but the concepts apply to any programming language.

Why Summarization Is Hard (and Why Claude Helps)

Evaluating summary quality is notoriously subjective. What one reader considers a perfect summary, another may find lacking. Traditional metrics like ROUGE scores measure word overlap with reference summaries but miss nuances like coherence, factual accuracy, and relevance to the reader's goals.

Claude helps overcome these challenges by:

Understanding context and intent beyond simple keyword matching
Following detailed instructions about summary structure and focus
Handling documents up to 100K+ tokens in a single call
Generating summaries in specific formats (bullet points, structured data, executive briefs)

Setup and Data Preparation

First, install the required packages:

pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo

You'll also need a Claude API key. Set it as an environment variable:

export ANTHROPIC_API_KEY="your-api-key-here"

Extracting Text from Documents

For this guide, we'll use a publicly available Sublease Agreement from the SEC website. Here's how to extract text from a PDF:

import pypdf
def extract_text_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as file:
        reader = pypdf.PdfReader(file)
        text = ""
        for page in reader.pages:
            text += page.extract_text()
    return text
Load your document
text = extract_text_from_pdf("sublease_agreement.pdf")

If you're working with plain text, simply define text = "your content here".

Basic Summarization with Claude

Let's start with a simple summarization function. This foundational approach uses Claude's instruction-following capabilities effectively:

import anthropic
client = anthropic.Anthropic()
def summarize_document(text, max_tokens=1000):
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=max_tokens,
        system="You are an expert document summarizer. Create concise, accurate summaries that capture key information.",
        messages=[
            {
                "role": "user",
                "content": f"Please summarize the following document:\n\n{text}"
            }
        ]
    )
    return response.content[0].text
summary = summarize_document(text)
print(summary)

This works well for short documents, but for longer texts you'll need more sophisticated approaches.

Advanced Techniques for Better Summaries

1. Guided Summarization

Instead of a generic "summarize this" prompt, guide Claude with specific instructions about what to extract:

def guided_summarize(text, focus_areas):
    prompt = f"""Please analyze the following legal document and provide:
A one-paragraph executive summary
Key parties involved and their roles
Important dates and deadlines
Financial terms and obligations
Termination conditions

Focus particularly on: {', '.join(focus_areas)}
Document:
{text}
"""
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1500,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text
Example usage
focus = ["indemnification clauses", "payment terms", "termination rights"]
summary = guided_summarize(text, focus)

2. Domain-Specific Guided Summarization

For legal documents, you can create specialized prompts that extract structured metadata:

def legal_document_summary(text):
    prompt = f"""You are a legal document analyst. Extract the following information from this contract:
Document Type (e.g., Sublease, NDA, Service Agreement)
Effective Date
Parties Involved
Key Obligations (list up to 5)
Financial Terms (amounts, payment schedule, late fees)
Termination Clause Summary
Governing Law
Any Unusual or High-Risk Clauses

Format your response as structured markdown with clear headings.
Document:
{text}
"""
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=2000,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

3. Meta-Summarization for Long Documents

When documents exceed Claude's context window, use a chunk-and-summarize approach:

def chunk_text(text, chunk_size=50000):
    """Split text into overlapping chunks."""
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - 5000):  # 5000 word overlap
        chunk = ' '.join(words[i:i + chunk_size])
        chunks.append(chunk)
    return chunks
def meta_summarize(text):
    # Step 1: Summarize each chunk
    chunks = chunk_text(text)
    chunk_summaries = []
    
    for i, chunk in enumerate(chunks):
        summary = summarize_document(chunk, max_tokens=500)
        chunk_summaries.append(summary)
        print(f"Processed chunk {i+1}/{len(chunks)}")
    
    # Step 2: Summarize the summaries
    combined = "\n\n---\n\n".join(chunk_summaries)
    final_summary = summarize_document(
        f"Combine these section summaries into a coherent overall summary:\n\n{combined}",
        max_tokens=1000
    )
    return final_summary
final = meta_summarize(text)

Summary-Indexed RAG: An Advanced Approach

For large document collections, combine summarization with retrieval-augmented generation (RAG):

import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
class SummaryIndex:
    def __init__(self, documents):
        self.documents = documents
        self.summaries = []
        self.vectorizer = TfidfVectorizer(stop_words='english')
        
    def build_index(self):
        # Generate summaries for each document
        for doc in self.documents:
            summary = summarize_document(doc, max_tokens=200)
            self.summaries.append(summary)
        
        # Create TF-IDF matrix from summaries
        self.tfidf_matrix = self.vectorizer.fit_transform(self.summaries)
    
    def query(self, question, top_k=3):
        # Vectorize the question
        question_vec = self.vectorizer.transform([question])
        
        # Find most relevant summaries
        similarities = cosine_similarity(question_vec, self.tfidf_matrix)[0]
        top_indices = np.argsort(similarities)[-top_k:][::-1]
        
        # Return relevant documents
        results = []
        for idx in top_indices:
            results.append({
                'summary': self.summaries[idx],
                'document': self.documents[idx],
                'relevance': similarities[idx]
            })
        return results
Usage
index = SummaryIndex([doc1, doc2, doc3])
index.build_index()
results = index.query("What are the termination conditions?")

Best Practices for Summarization RAG

Summarize before indexing: Store summaries alongside full documents for faster retrieval
Use hierarchical summaries: Create section-level summaries for very long documents
Include metadata: Tag summaries with document type, date, and key entities
Update incrementally: Re-summarize only changed documents, not the entire corpus

Evaluating Summary Quality

Automated evaluation helps you iterate and improve. Here's how to use ROUGE scores:

from rouge_score import rouge_scorer
def evaluate_summary(reference, generated):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(reference, generated)
    
    print("ROUGE-1:", scores['rouge1'].fmeasure)
    print("ROUGE-2:", scores['rouge2'].fmeasure)
    print("ROUGE-L:", scores['rougeL'].fmeasure)
    return scores

For more nuanced evaluation, use Promptfoo to create custom test cases:

# Install Promptfoo
npx promptfoo@latest init
Create evaluation config
cat > promptfooconfig.yaml << EOF
evaluation:
  prompts:
    - "Summarize this document: {{document}}"
  providers:
    - id: anthropic:claude-3-5-sonnet-20241022
  tests:
    - vars:
        document: "path/to/test_doc.txt"
      assert:
        - type: contains-any
          value: ["key term 1", "key term 2"]
        - type: latency
          threshold: 5000
EOF
Run evaluation
npx promptfoo@latest eval

Iterative Improvement Strategy

Follow this cycle to refine your summarization:

Baseline: Start with a simple prompt and evaluate
Analyze failures: Where does the summary miss important details?
Refine prompts: Add specific instructions, examples, or constraints
Test edge cases: Try different document types and lengths
Automate evaluation: Use ROUGE + custom checks in CI/CD

Conclusion and Best Practices

Summarization with Claude is both powerful and flexible. Here are key takeaways:

Be specific in your prompts: Tell Claude exactly what you want (structure, length, focus areas)
Use guided summarization for structured output: Extract metadata alongside narrative summaries
Chunk long documents strategically: Overlap chunks to maintain context
Evaluate both automatically and manually: ROUGE scores catch some issues, but human review catches nuance
Iterate based on your use case: The perfect summary for a legal team differs from one for executives

Key Takeaways

Prompt specificity matters: Generic "summarize this" prompts produce generic results. Guide Claude with detailed instructions about structure, focus areas, and output format.
Handle long documents with chunking and meta-summarization: Break documents into overlapping chunks, summarize each, then summarize the summaries for coherent long-document understanding.
Combine summarization with RAG for scalable knowledge retrieval: Index document summaries for fast, relevant retrieval across large document collections.
Evaluate summaries using multiple methods: Use ROUGE scores for automated checks and Promptfoo for custom assertions, but always supplement with human review for nuanced quality.
Iterate systematically: Start simple, identify failure modes, refine prompts, and automate evaluation to continuously improve your summarization pipeline.