Guide2026-05-05

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn how to summarize complex legal documents using Claude AI. This guide covers prompt engineering, metadata extraction, handling long texts, ROUGE evaluation, and iterative improvement strategies.

Quick Answer

This guide teaches you how to use Claude for summarizing legal documents, including crafting effective prompts, handling long texts beyond token limits, extracting metadata, evaluating summary quality with ROUGE scores, and building a RAG-based summarization system.

Claude APISummarizationPrompt EngineeringRAGLegal Documents

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Summarization is one of the most powerful and practical applications of large language models. Whether you're a lawyer reviewing contracts, a researcher digesting papers, or a business analyst processing reports, the ability to condense lengthy documents into clear, actionable summaries saves hours of manual work.

Claude excels at summarization tasks thanks to its large context window (up to 200K tokens), nuanced language understanding, and strong instruction-following capabilities. In this guide, we'll walk through a complete workflow—from basic summarization to advanced Retrieval-Augmented Generation (RAG) approaches—using real legal documents as our test case.

Why Legal Documents?

Legal documents are an ideal stress test for summarization. They contain:

Dense, technical language
Critical fine print that must be preserved
Complex cross-references
High stakes for accuracy

If you can summarize a sublease agreement reliably, you can summarize almost anything.

Setting Up Your Environment

First, install the required packages:

pip install anthropic pypdf pandas matplotlib scikit-learn numpy rouge-score nltk seaborn promptfoo

You'll also need a Claude API key. Set it as an environment variable:

export ANTHROPIC_API_KEY="sk-ant-..."

Step 1: Data Preparation

Before summarizing, you need to extract clean text from your source documents. For PDFs, use pypdf:

import pypdf
def extract_text_from_pdf(pdf_path):
    reader = pypdf.PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text
Example: SEC sublease agreement
text = extract_text_from_pdf("sublease_agreement.pdf")

If you're working with plain text, simply define:

text = """Your document text here..."""

Step 2: Basic Summarization

Let's start with a simple summarization function. Even this basic approach uses important Claude features like the assistant role and stop sequences:

import anthropic
client = anthropic.Anthropic()
def summarize_basic(text, max_tokens=500):
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=max_tokens,
        system="You are an expert legal document summarizer. Create a concise summary that captures all key points.",
        messages=[
            {
                "role": "user",
                "content": f"Please summarize the following document:\n\n{text}"
            }
        ]
    )
    return response.content[0].text
summary = summarize_basic(text)
print(summary)

This works, but it has limitations. The summary might miss critical details or include irrelevant information. Let's improve it.

Step 3: Multi-Shot Summarization

A single prompt often produces inconsistent results. Multi-shot summarization provides Claude with examples of good summaries:

def summarize_multishot(text):
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=800,
        system="You are an expert legal document summarizer.",
        messages=[
            {
                "role": "user",
                "content": "Summarize this employment contract:\n\n[Example contract text]"
            },
            {
                "role": "assistant",
                "content": "[Example good summary]"
            },
            {
                "role": "user",
                "content": f"Now summarize this document:\n\n{text}"
            }
        ]
    )
    return response.content[0].text

Step 4: Guided Summarization with Metadata Extraction

For legal documents, you often need specific information. Guided summarization uses structured prompts to extract both a narrative summary and key metadata:

def guided_summarize(text):
    prompt = f"""Analyze the following legal document and provide:
EXECUTIVE SUMMARY (3-5 sentences)
KEY PARTIES: List all parties involved
EFFECTIVE DATE: When does this agreement take effect?
TERM: How long is the agreement valid?
KEY OBLIGATIONS: Bullet list of main responsibilities for each party
FINANCIAL TERMS: Payment amounts, schedules, and conditions
TERMINATION CLAUSES: How can either party end this agreement?
RISK FACTORS: Any unusual or high-risk provisions

Document:\n{text}"""
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1500,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

Step 5: Handling Long Documents (Beyond Token Limits)

When documents exceed Claude's context window, use a chunk-and-summarize approach:

def chunk_text(text, chunk_size=50000):
    """Split text into manageable chunks."""
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunk = " ".join(words[i:i+chunk_size])
        chunks.append(chunk)
    return chunks
def hierarchical_summarize(text):
    chunks = chunk_text(text)
    chunk_summaries = []
    
    # First pass: summarize each chunk
    for chunk in chunks:
        summary = summarize_basic(chunk, max_tokens=300)
        chunk_summaries.append(summary)
    
    # Second pass: summarize the summaries
    combined = "\n\n".join(chunk_summaries)
    final_summary = summarize_basic(combined, max_tokens=800)
    
    return final_summary

Step 6: Advanced RAG Approach with Summary Indexing

For large document collections, build a RAG system where each document is summarized and indexed:

import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
def build_summary_index(documents):
    """Create a searchable index of document summaries."""
    summaries = []
    for doc in documents:
        summary = summarize_basic(doc, max_tokens=200)
        summaries.append(summary)
    
    vectorizer = TfidfVectorizer()
    summary_vectors = vectorizer.fit_transform(summaries)
    
    return summaries, vectorizer, summary_vectors
def query_summary_index(query, summaries, vectorizer, summary_vectors, top_k=3):
    """Find most relevant documents for a query."""
    query_vector = vectorizer.transform([query])
    similarities = cosine_similarity(query_vector, summary_vectors)[0]
    top_indices = np.argsort(similarities)[-top_k:][::-1]
    
    results = []
    for idx in top_indices:
        results.append({
            "summary": summaries[idx],
            "relevance": similarities[idx]
        })
    return results

Best Practices for Summarization RAG

Summarize first, then index: Always create concise summaries before vectorizing. Raw documents introduce noise.
Use hierarchical summaries: For very long documents, create section-level summaries before document-level summaries.
Include metadata: Tag summaries with document type, date, and source for better filtering.
Chunk strategically: Break documents at natural boundaries (sections, paragraphs) rather than arbitrary token counts.

Step 7: Evaluating Summary Quality

Evaluation is the hardest part of summarization. Here's a practical approach using ROUGE scores:

from rouge_score import rouge_scorer
def evaluate_summary(reference, generated):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(reference, generated)
    
    print("ROUGE-1:", scores['rouge1'].fmeasure)
    print("ROUGE-2:", scores['rouge2'].fmeasure)
    print("ROUGE-L:", scores['rougeL'].fmeasure)
    
    return scores

For more nuanced evaluation, use Claude itself as a judge:

def evaluate_with_claude(summary, original_text):
    prompt = f"""Evaluate this summary on a scale of 1-10 for:
Factual Accuracy: Does it contain any errors?
Completeness: Does it cover all key points?
Conciseness: Is it appropriately brief?
Clarity: Is it easy to understand?

Original text: {original_text[:1000]}...
Summary: {summary}
Provide scores and brief justification."""
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

Step 8: Iterative Improvement

Use evaluation results to refine your prompts systematically:

Start simple: Get a baseline with basic summarization.
Identify gaps: Where does the summary fail? Missing details? Inaccurate information?
Add constraints: "Ensure all monetary amounts are included." or "List all dates mentioned."
Provide examples: Show Claude what a good summary looks like.
Test edge cases: Try different document types and lengths.

def iterative_improve(text, iterations=3):
    """Iteratively improve summarization based on feedback."""
    current_prompt = "Summarize this document concisely."
    
    for i in range(iterations):
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=800,
            messages=[
                {"role": "user", "content": f"{current_prompt}\n\n{text}"}
            ]
        )
        summary = response.content[0].text
        
        # Get feedback
        feedback = evaluate_with_claude(summary, text)
        
        # Refine prompt based on feedback
        current_prompt = f"{current_prompt}\n\nImprove based on this feedback: {feedback}"
    
    return summary

Conclusion and Best Practices

Summarization with Claude is both an art and a science. Here are the key principles to follow:

Be specific in your prompts: Tell Claude exactly what you want—don't leave it guessing.
Use structured outputs: Request bullet points, tables, or specific sections for complex documents.
Chunk intelligently: For long documents, break at natural boundaries and use hierarchical summarization.
Evaluate rigorously: Combine automated metrics (ROUGE) with human or AI-based quality checks.
Iterate: No prompt is perfect on the first try. Use feedback loops to refine.
Consider your audience: A summary for a lawyer differs from one for a business executive.

Key Takeaways

Claude's large context window and instruction-following make it ideal for document summarization, especially for complex domains like legal contracts.
Guided summarization with structured prompts extracts both narrative summaries and specific metadata, making outputs more actionable.
For documents exceeding token limits, use a hierarchical chunk-and-summarize approach that first summarizes sections, then combines those summaries.
Build RAG systems by indexing document summaries rather than raw text for faster, more accurate retrieval.
Evaluate summaries using a combination of ROUGE scores and LLM-as-judge feedback, then iteratively refine your prompts based on identified gaps.