Guide2026-04-30

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn how to summarize complex documents using Claude AI. This guide covers prompt engineering, metadata extraction, handling long texts, and evaluation techniques using ROUGE scores and Promptfoo.

Quick Answer

This guide teaches you to summarize documents with Claude, from basic prompts to advanced techniques like guided summarization, meta-summarization, and RAG-based indexing. You'll learn to extract metadata, handle long documents, and evaluate summary quality using ROUGE scores and Promptfoo.

Claude SummarizationPrompt EngineeringLegal Document AnalysisRAGEvaluation Metrics

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

In today's information-dense world, the ability to distill lengthy documents into concise, actionable summaries is invaluable. Whether you're a legal professional parsing contracts, a researcher reviewing papers, or a business analyst synthesizing reports, Claude's summarization capabilities can dramatically reduce your cognitive load.

This guide walks you through practical, battle-tested techniques for summarizing documents with Claude. We'll start with the basics and progressively build toward advanced methods, including handling documents that exceed token limits and implementing Retrieval-Augmented Generation (RAG) for large-scale summarization.

Why Summarization Is Hard (and Why Claude Excels)

Summarization is deceptively difficult. Unlike classification or extraction tasks, there's rarely a single "correct" summary. Quality depends on context, audience, and purpose. A summary for a legal team differs vastly from one for executives.

Claude excels here because:

Long context window: Handles documents up to 200K tokens
Nuanced understanding: Grasps legal jargon, technical terms, and domain-specific language
Controllable output: You can guide the style, length, and focus of summaries

Setting Up Your Environment

Before diving in, let's set up the necessary tools. You'll need:

pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo

And an Anthropic API key:

import anthropic
client = anthropic.Anthropic(api_key="your-api-key")

Data Preparation: Extracting Text from PDFs

Most real-world documents come as PDFs. Here's a robust function to extract and clean text:

import pypdf
def extract_text_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as file:
        reader = pypdf.PdfReader(file)
        text = ""
        for page in reader.pages:
            text += page.extract_text()
    return text
def clean_text(text):
    # Remove excessive whitespace
    import re
    text = re.sub(r'\s+', ' ', text)
    # Remove non-ASCII characters if needed
    text = text.encode('ascii', 'ignore').decode()
    return text.strip()
Example usage
text = extract_text_from_pdf("sublease_agreement.pdf")
text = clean_text(text)

For quick testing, you can also paste text directly:

text = """Your document text here..."""

Basic Summarization: Your First Claude Prompt

Let's start with a straightforward summarization call:

def summarize_with_claude(text, max_tokens=500):
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=max_tokens,
        messages=[
            {
                "role": "user",
                "content": f"Please provide a concise summary of the following document:\n\n{text}"
            }
        ]
    )
    return response.content[0].text
summary = summarize_with_claude(text)
print(summary)

Important: Notice we're using the user role with a clear instruction. This is the foundation. But for production, you'll want more control.

Multi-Shot Summarization: Providing Examples

Claude performs better when you show it what "good" looks like. This is called few-shot prompting:

def summarize_with_examples(text):
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=500,
        messages=[
            {
                "role": "user",
                "content": """I will show you a document and an example of a good summary. Then I'll ask you to summarize a new document.
Example Document:
[Short example document]
Example Good Summary:
[Corresponding summary]
Now summarize this document:
""" + text
            }
        ]
    )
    return response.content[0].text

This technique dramatically improves consistency, especially for domain-specific content.

Advanced Techniques: Guided and Domain-Specific Summarization

Guided Summarization

Instead of a generic "summarize this," guide Claude with specific instructions:

def guided_summarize(text, instructions):
    prompt = f"""Summarize the following document according to these instructions:
INSTRUCTIONS:
{instructions}
DOCUMENT:
{text}
SUMMARY:"""
    
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=800,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text
Example: Legal document focus
instructions = """
Identify all parties involved
List key dates and deadlines
Highlight termination clauses
Note any financial obligations
Keep under 300 words
"""
summary = guided_summarize(text, instructions)

Domain-Specific Guided Summarization

For legal documents, you can add domain knowledge:

def legal_summarize(text):
    prompt = f"""You are a legal document analyst. Summarize this contract with:
Parties and their roles
Key obligations for each party
Termination conditions
Liability and indemnification clauses
Governing law and jurisdiction
Any unusual or high-risk provisions

Document:
{text}
Legal Summary:"""
    
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

Meta-Summarization: Handling Long Documents

When documents exceed Claude's context window, use a chunk-and-merge strategy:

def chunk_text(text, chunk_size=50000):
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunk = ' '.join(words[i:i+chunk_size])
        chunks.append(chunk)
    return chunks
def meta_summarize(text):
    chunks = chunk_text(text)
    
    # Step 1: Summarize each chunk
    chunk_summaries = []
    for i, chunk in enumerate(chunks):
        summary = summarize_with_claude(chunk, max_tokens=300)
        chunk_summaries.append(summary)
        print(f"Processed chunk {i+1}/{len(chunks)}")
    
    # Step 2: Combine chunk summaries
    combined = "\n\n".join(chunk_summaries)
    
    # Step 3: Final summary of summaries
    final_summary = summarize_with_claude(
        f"Combine these section summaries into a coherent overall summary:\n\n{combined}",
        max_tokens=800
    )
    
    return final_summary

This hierarchical approach preserves context while staying within token limits.

Summary Indexed Documents: An Advanced RAG Approach

For massive document collections, use a RAG (Retrieval-Augmented Generation) approach where you index summaries:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
class SummaryIndex:
    def __init__(self):
        self.documents = []
        self.summaries = []
        self.vectorizer = TfidfVectorizer()
        
    def add_document(self, doc_id, text):
        summary = summarize_with_claude(text, max_tokens=200)
        self.documents.append({"id": doc_id, "text": text})
        self.summaries.append({"id": doc_id, "summary": summary})
        
    def search(self, query, top_k=3):
        # Vectorize summaries
        summary_texts = [s["summary"] for s in self.summaries]
        vectors = self.vectorizer.fit_transform(summary_texts + [query])
        
        # Find most similar
        similarities = cosine_similarity(vectors[-1:], vectors[:-1])[0]
        top_indices = np.argsort(similarities)[-top_k:][::-1]
        
        return [self.summaries[i] for i in top_indices]
    
    def query(self, question):
        relevant = self.search(question)
        context = "\n\n".join([r["summary"] for r in relevant])
        
        response = client.messages.create(
            model="claude-3-sonnet-20240229",
            max_tokens=500,
            messages=[{
                "role": "user",
                "content": f"Based on these document summaries, answer: {question}\n\nContext:\n{context}"
            }]
        )
        return response.content[0].text

Best Practices for Summarization RAG

Chunk strategically: Split documents at natural boundaries (paragraphs, sections)
Index summaries, not raw text: Summaries are more searchable
Use hybrid search: Combine semantic and keyword-based retrieval
Cache summaries: Avoid regenerating summaries for the same documents

Evaluating Summary Quality

Evaluation is critical but challenging. Here's a practical approach:

ROUGE Score Evaluation

from rouge_score import rouge_scorer
def evaluate_summary(reference, generated):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(reference, generated)
    
    print("ROUGE-1:", scores['rouge1'].fmeasure)
    print("ROUGE-2:", scores['rouge2'].fmeasure)
    print("ROUGE-L:", scores['rougeL'].fmeasure)
    
    return scores

Custom Evaluation with Promptfoo

Promptfoo allows you to define custom evaluation criteria:

# promptfoo config
prompts:
  - "Summarize this document: {{text}}"
  - "Provide a concise summary focusing on key points: {{text}}"
tests:
  - vars:
      text: "Your test document here"
    assert:
      - type: contains-all
        value: ["parties", "obligations", "termination"]
      - type: max-length
        value: 500
      - type: python
        value: |
          # Custom check for factual accuracy
          def check_facts(output):
              return "parties" in output.lower()

Iterative Improvement: A Practical Workflow

Baseline: Start with a simple prompt
Evaluate: Use ROUGE scores and human review
Identify gaps: Is it missing key information? Too verbose?
Refine prompt: Add instructions, examples, or constraints
Re-evaluate: Compare against baseline
Repeat: Until quality meets your threshold

Conclusion and Best Practices

Start simple, then layer: Begin with basic prompts, then add guidance
Use examples: Few-shot prompting dramatically improves consistency
Handle long documents: Use chunk-and-merge or RAG approaches
Evaluate systematically: Combine automated metrics with human review
Iterate: Summarization is rarely perfect on the first try
Domain-specific prompts: Tailor instructions to your content type
Monitor token usage: Long documents can be expensive; optimize chunk sizes

Key Takeaways

Guided prompts outperform generic ones: Specific instructions yield summaries that match your needs
Chunk-and-merge handles any document length: Meta-summarization preserves context across large documents
RAG with summary indexing scales to document collections: Search summaries, not raw text
Evaluation requires multiple metrics: Combine ROUGE scores with custom checks for factual accuracy and completeness
Iterative refinement is essential: Treat summarization as an evolving process, not a one-shot task