BeClaude
Guide2026-04-25

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG Techniques

Learn how to summarize long documents with Claude AI, including prompt engineering, metadata extraction, handling token limits, and evaluating summary quality with ROUGE scores.

Quick Answer

This guide teaches you how to use Claude for effective document summarization, covering basic prompts, multi-shot techniques, guided summarization for legal docs, meta-summarization for long texts, and RAG-based approaches. You'll also learn to evaluate and iteratively improve summary quality.

Claude summarizationprompt engineeringlegal document summarizationROUGE evaluationRAG summarization

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG Techniques

Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher synthesizing papers, or a product manager reviewing customer feedback, the ability to condense lengthy documents into actionable insights is invaluable.

Claude excels at summarization tasks thanks to its large context window and nuanced understanding of language. This guide walks you through a complete workflow—from basic summarization to advanced Retrieval-Augmented Generation (RAG) techniques—using real code examples and evaluation strategies.

Why Summarization Is Hard (and Why Claude Helps)

Evaluating summary quality is notoriously subjective. What one reader considers a perfect summary, another may find too vague or too detailed. Traditional metrics like ROUGE scores measure word overlap but miss coherence, factual accuracy, and relevance. Claude's ability to follow detailed instructions and maintain context over long documents makes it uniquely suited to this challenge.

Setting Up Your Environment

First, install the required packages:

pip install anthropic pypdf pandas matplotlib scikit-learn numpy rouge-score nltk seaborn promptfoo

You'll also need a Claude API key. Set it as an environment variable:

export ANTHROPIC_API_KEY="sk-ant-..."

Data Preparation: Extracting Text from PDFs

Before summarizing, you need clean text. Here's a Python function to extract text from PDFs using pypdf:

import pypdf

def extract_text_from_pdf(pdf_path: str) -> str: """Extract text from a PDF file.""" text = "" with open(pdf_path, "rb") as file: reader = pypdf.PdfReader(file) for page in reader.pages: text += page.extract_text() return text

Example usage

document_text = extract_text_from_pdf("sublease_agreement.pdf")

For this guide, we'll use a publicly available Sublease Agreement from the SEC's EDGAR system. You can substitute any PDF or plain text.

Basic Summarization with Claude

Let's start with a simple summarization function. This uses the Messages API with a system prompt and user message:

import anthropic

client = anthropic.Anthropic()

def summarize_text(text: str, max_tokens: int = 500) -> str: """Basic summarization using Claude.""" response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=max_tokens, system="You are an expert summarizer. Provide a concise summary of the key points.", messages=[ {"role": "user", "content": f"Please summarize the following document:\n\n{text}"} ] ) return response.content[0].text

summary = summarize_text(document_text) print(summary)

This works, but it's basic. The summary will be generic and may miss important details specific to your use case.

Multi-Shot Basic Summarization

A simple improvement is to use a multi-shot approach—asking Claude to first extract key points, then synthesize them into a summary:

def multi_shot_summarize(text: str) -> str:
    """Two-step summarization: extract then synthesize."""
    # Step 1: Extract key points
    extraction = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1000,
        messages=[
            {"role": "user", "content": f"Extract the 10 most important points from this document:\n\n{text}"}
        ]
    )
    key_points = extraction.content[0].text
    
    # Step 2: Synthesize into a summary
    synthesis = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=500,
        messages=[
            {"role": "user", "content": f"Based on these key points, write a coherent summary:\n\n{key_points}"}
        ]
    )
    return synthesis.content[0].text

This two-step process often produces more structured and comprehensive summaries.

Advanced Techniques

Guided Summarization

Instead of a generic prompt, guide Claude with specific instructions about what to include:

def guided_summarize(text: str, focus_areas: list[str]) -> str:
    """Summarize with specific focus areas."""
    focus_str = "\n".join([f"- {area}" for area in focus_areas])
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=800,
        system="You are a legal document analyst. Provide structured summaries.",
        messages=[
            {"role": "user", "content": f"""
Summarize this legal document with emphasis on:
{focus_str}

Document: {text}

Provide your summary in the following format:

  • Document Type and Parties
  • Key Dates and Deadlines
  • Financial Obligations
  • Termination Conditions
  • Risk Factors
"""} ] ) return response.content[0].text

Example for a sublease agreement

summary = guided_summarize( document_text, focus_areas=["Rent and payment terms", "Maintenance responsibilities", "Early termination clauses"] )

Domain-Specific Guided Summarization

For legal documents, you can extract specific metadata fields:

def extract_legal_metadata(text: str) -> dict:
    """Extract structured metadata from a legal document."""
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=500,
        messages=[
            {"role": "user", "content": f"""
Extract the following fields from this legal document. Return as JSON:
  • document_type
  • parties_involved (list of names)
  • effective_date
  • expiration_date
  • governing_law_state
  • total_pages
Document: {text} """} ] ) return response.content[0].text

Meta-Summarization: Handling Long Documents

When a document exceeds Claude's context window (or your token budget), use a chunk-and-summarize approach:

def chunk_text(text: str, chunk_size: int = 3000) -> list[str]:
    """Split text into chunks of approximately chunk_size words."""
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunk = " ".join(words[i:i + chunk_size])
        chunks.append(chunk)
    return chunks

def meta_summarize(text: str) -> str: """Summarize a long document by chunking, summarizing each chunk, then summarizing the summaries.""" chunks = chunk_text(text) # Step 1: Summarize each chunk chunk_summaries = [] for i, chunk in enumerate(chunks): summary = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=300, messages=[ {"role": "user", "content": f"Summarize chunk {i+1} of {len(chunks)}:\n\n{chunk}"} ] ) chunk_summaries.append(summary.content[0].text) # Step 2: Summarize the summaries combined = "\n\n".join(chunk_summaries) final_summary = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=500, messages=[ {"role": "user", "content": f"Synthesize these chunk summaries into a single coherent summary:\n\n{combined}"} ] ) return final_summary.content[0].text

Summary Indexed Documents: An Advanced RAG Approach

For very large document collections, use a RAG (Retrieval-Augmented Generation) approach where you pre-summarize chunks and index them for retrieval:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def build_summary_index(documents: list[str]) -> dict: """Build a searchable index of document summaries.""" # Summarize each document summaries = [] for doc in documents: summary = summarize_text(doc, max_tokens=200) summaries.append(summary) # Create TF-IDF vectors for summaries vectorizer = TfidfVectorizer() tfidf_matrix = vectorizer.fit_transform(summaries) return { "summaries": summaries, "vectorizer": vectorizer, "tfidf_matrix": tfidf_matrix }

def query_summary_index(query: str, index: dict, top_k: int = 3) -> list[str]: """Retrieve the most relevant summaries for a query.""" query_vec = index["vectorizer"].transform([query]) similarities = cosine_similarity(query_vec, index["tfidf_matrix"]).flatten() top_indices = np.argsort(similarities)[-top_k:][::-1] return [index["summaries"][i] for i in top_indices]

Best Practices for Summarization RAG

  • Chunk strategically: Align chunk boundaries with document sections (paragraphs, clauses).
  • Include metadata: Store document title, date, and source alongside each summary.
  • Use hybrid search: Combine semantic search (embeddings) with keyword search (BM25) for better retrieval.
  • Re-rank results: After retrieval, use Claude to re-rank summaries by relevance to the query.

Evaluating Summary Quality

Automated evaluation helps you iterate quickly. Here's how to compute ROUGE scores:

from rouge_score import rouge_scorer

def evaluate_summary(reference: str, generated: str) -> dict: """Compute ROUGE-1, ROUGE-2, and ROUGE-L scores.""" scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True) scores = scorer.score(reference, generated) return { "rouge1_f1": scores['rouge1'].fmeasure, "rouge2_f1": scores['rouge2'].fmeasure, "rougeL_f1": scores['rougeL'].fmeasure }

For more nuanced evaluation, use Promptfoo to define custom test cases:

# promptfoo_config.yaml
prompts:
  - "Summarize this document: {{document}}"
  - "Provide a bullet-point summary of key facts from: {{document}}"

tests: - vars: document: "file://sublease_agreement.txt" assert: - type: contains-all value: ["rent", "term", "termination"] - type: latency threshold: 5000

Iterative Improvement

To systematically improve your summarization pipeline:

  • Create a test set: Collect 10-20 documents with human-written reference summaries.
  • Baseline: Run your current prompt and compute ROUGE scores.
  • Experiment: Modify prompts (add examples, change structure, adjust focus areas).
  • Compare: Use Promptfoo to run A/B tests between prompt variants.
  • Analyze failures: Look at low-scoring summaries to identify patterns (missing details, hallucinations, poor structure).
  • Refine: Update prompts based on failure analysis and repeat.

Conclusion and Best Practices

Summarization with Claude is both powerful and flexible. Here are the key principles to keep in mind:

  • Be specific in your prompts: Generic prompts yield generic summaries. Specify format, length, focus areas, and audience.
  • Use structured outputs: Request JSON or markdown tables for metadata extraction.
  • Chunk wisely: For long documents, chunk at natural boundaries and use meta-summarization.
  • Evaluate systematically: Combine automated metrics (ROUGE) with custom assertions (Promptfoo) and human review.
  • Iterate: Summarization is rarely perfect on the first try. Build a feedback loop.

Key Takeaways

  • Claude's large context window and instruction-following ability make it ideal for summarization, especially for complex documents like legal contracts.
  • Guided prompts with specific focus areas and output formats produce significantly better summaries than open-ended requests.
  • For documents exceeding context limits, use a chunk-and-meta-summarize strategy to preserve information across the entire text.
  • RAG-based summarization with pre-indexed summaries enables fast retrieval from large document collections.
  • Combine ROUGE scores with custom evaluation frameworks like Promptfoo to systematically measure and improve summary quality over time.