Guide2026-05-05

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn how to summarize legal documents and long texts using Claude AI. Covers prompt engineering, metadata extraction, ROUGE evaluation, and iterative improvement techniques.

Quick Answer

This guide teaches you how to use Claude for effective document summarization, from crafting basic prompts to advanced techniques like guided summarization, meta-summarization, and RAG-based indexing. You'll also learn how to evaluate summary quality using ROUGE scores and Promptfoo.

summarizationprompt engineeringlegal documentsROUGE evaluationRAG

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Summarization is one of the most practical applications of large language models. Whether you're a lawyer reviewing contracts, a researcher digesting papers, or a business analyst processing reports, the ability to condense lengthy documents into clear, actionable summaries saves time and improves decision-making.

Claude excels at summarization tasks thanks to its large context window and nuanced understanding of language. In this guide, we'll walk through a complete workflow—from basic summarization to advanced techniques like guided summarization, meta-summarization, and RAG-based indexing. We'll also cover how to evaluate and iteratively improve your summaries using automated metrics.

Why Summarization Is Hard (and Why Claude Helps)

Evaluating summary quality is notoriously subjective. Different readers value different aspects: some want bullet-point brevity, others need narrative flow. Traditional metrics like ROUGE scores measure word overlap with reference summaries but miss coherence, factual accuracy, and relevance.

Claude's strength lies in its ability to follow detailed instructions and adapt its output to your specific needs. By combining careful prompt engineering with systematic evaluation, you can build a summarization pipeline that consistently delivers high-quality results.

Setup and Data Preparation

First, install the required packages:

pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo

You'll also need a Claude API key. Set it as an environment variable:

export ANTHROPIC_API_KEY="your-api-key-here"

Extracting Text from PDFs

For this guide, we'll use a publicly available Sublease Agreement from the SEC's EDGAR system. Here's how to extract text from a PDF:

import pypdf
def extract_text_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as file:
        reader = pypdf.PdfReader(file)
        text = ""
        for page in reader.pages:
            text += page.extract_text()
    return text
Load your document
text = extract_text_from_pdf("sublease_agreement.pdf")

If you don't have a PDF, you can simply define text = "your text here" and skip the extraction step.

Basic Summarization with Claude

Let's start with a simple summarization function. Even this basic approach uses important Claude features like the assistant role and stop sequences.

import anthropic
client = anthropic.Anthropic()
def summarize_basic(text, max_tokens=500):
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=max_tokens,
        system="You are an expert summarizer. Create a concise summary that captures the key points.",
        messages=[
            {"role": "user", "content": f"Please summarize the following document:\n\n{text}"}
        ]
    )
    return response.content[0].text
summary = summarize_basic(text)
print(summary)

This works, but it's generic. For legal documents, we need more structure.

Multi-Shot Basic Summarization

A simple improvement is to provide Claude with examples of good summaries. This technique, known as few-shot prompting, helps the model understand your expectations:

def summarize_multishot(text):
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=500,
        messages=[
            {"role": "user", "content": "Summarize this contract clause: 'Party A shall indemnify Party B against all losses arising from breach of confidentiality.'"},
            {"role": "assistant", "content": "Indemnification: Party A must cover all losses Party B incurs if Party A breaches confidentiality."},
            {"role": "user", "content": f"Now summarize this document:\n\n{text}"}
        ]
    )
    return response.content[0].text

Advanced Techniques

Guided Summarization

Instead of a free-form summary, guide Claude to extract specific sections. This is especially useful for legal documents where you need structured output:

def guided_summarize(text):
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=800,
        system="You are a legal document analyst. Extract the following sections from the document.",
        messages=[
            {"role": "user", "content": f"""Analyze this legal document and provide:
PARTIES: Who are the parties involved?
EFFECTIVE DATE: When does the agreement take effect?
KEY OBLIGATIONS: What are the main responsibilities of each party?
TERMINATION: How can the agreement be terminated?
RISKS: What are the potential risks or liabilities?

Document:\n{text}"""}
        ]
    )
    return response.content[0].text

Domain-Specific Guided Summarization

For specialized fields like law or medicine, you can tailor the extraction criteria:

def legal_summarize(text):
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=1000,
        system="You are a senior contract attorney. Extract and explain key legal provisions.",
        messages=[
            {"role": "user", "content": f"""Extract the following from this legal document:
Governing Law: Which jurisdiction's laws apply?
Dispute Resolution: Arbitration, mediation, or litigation?
Confidentiality: What information is protected?
Indemnification: Who indemnifies whom, and for what?
Assignment: Can rights be transferred?

Document:\n{text}"""}
        ]
    )
    return response.content[0].text

Meta-Summarization: Including the Context of the Entire Document

When dealing with very long documents, you can use a recursive approach: summarize chunks, then summarize the summaries. This technique, called meta-summarization, preserves context across the entire document:

def chunk_text(text, chunk_size=3000):
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunk = ' '.join(words[i:i+chunk_size])
        chunks.append(chunk)
    return chunks
def meta_summarize(text):
    chunks = chunk_text(text)
    chunk_summaries = []
    
    for chunk in chunks:
        summary = summarize_basic(chunk, max_tokens=200)
        chunk_summaries.append(summary)
    
    # Now summarize the summaries
    combined = "\n\n".join(chunk_summaries)
    final_summary = summarize_basic(combined, max_tokens=500)
    return final_summary

Summary Indexed Documents: An Advanced RAG Approach

For even larger document collections, combine summarization with Retrieval-Augmented Generation (RAG). The idea is to create a summary index that allows you to quickly find relevant sections:

def create_summary_index(documents):
    index = {}
    for doc_id, doc_text in documents.items():
        summary = summarize_basic(doc_text, max_tokens=200)
        index[doc_id] = {
            "summary": summary,
            "full_text": doc_text
        }
    return index
def query_summary_index(query, index, top_k=3):
    # Simple keyword matching (in practice, use embeddings)
    scores = {}
    for doc_id, entry in index.items():
        if query.lower() in entry["summary"].lower():
            scores[doc_id] = entry["summary"].count(query.lower())
    
    ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_k]
    return [index[doc_id]["full_text"] for doc_id, _ in ranked]

Best Practices for Summarization RAG

Chunk strategically: Split documents at natural boundaries (paragraphs, sections) rather than arbitrary token counts.
Use embeddings: Replace keyword matching with semantic search using embeddings from models like text-embedding-3-small.
Include metadata: Store document title, date, and source alongside summaries for better filtering.
Iterate on chunk size: Test different chunk sizes (500, 1000, 2000 tokens) to find what works best for your domain.

Evaluations

Evaluating summary quality is critical. Here's how to use ROUGE scores:

from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
def evaluate_summary(reference, generated):
    scores = scorer.score(reference, generated)
    return {
        'rouge1_f1': scores['rouge1'].fmeasure,
        'rouge2_f1': scores['rouge2'].fmeasure,
        'rougeL_f1': scores['rougeL'].fmeasure
    }
Example
reference = "The agreement is between Party A and Party B, effective January 1, 2024."
generated = "Party A and Party B signed an agreement effective Jan 1, 2024."
print(evaluate_summary(reference, generated))

For more nuanced evaluation, use Promptfoo to run custom tests that check for factual accuracy, completeness, and adherence to format.

Iterative Improvement

Summarization is rarely perfect on the first try. Here's a systematic approach to improvement:

Baseline: Run your initial prompt and collect summaries.
Evaluate: Use ROUGE scores and manual review to identify weaknesses.
Hypothesize: Are summaries too long? Missing key details? Too verbose?
Refine prompts: Adjust instructions based on findings. For example:

- "Keep summaries under 100 words." - "Always include the effective date." - "Use bullet points for obligations."

Test: Run the new prompt on the same documents.
Compare: Did scores improve? Did manual reviewers prefer the new version?
Repeat: Continue until quality meets your threshold.

Conclusion and Best Practices

Summarization with Claude is powerful but requires thoughtful implementation. Here are our top recommendations:

Start simple, then layer complexity: Begin with basic prompts and add structure as needed.
Use guided extraction for structured documents: Legal, medical, and financial documents benefit from explicit section extraction.
Leverage meta-summarization for long documents: Chunk and summarize recursively to maintain context.
Evaluate systematically: Combine automated metrics (ROUGE) with human review.
Iterate based on feedback: Treat your prompt as a living document that improves over time.

By following these techniques, you can build a robust summarization pipeline that handles everything from short emails to multi-page contracts.

Key Takeaways

Claude's guided summarization allows you to extract structured metadata (parties, dates, obligations) from legal documents with high accuracy.
Meta-summarization enables handling of documents beyond Claude's context window by chunking and recursively summarizing.
ROUGE scores and Promptfoo provide automated evaluation, but always supplement with human review for nuanced quality.
Iterative prompt refinement is essential—treat your summarization prompt as a living artifact that improves through systematic testing.
RAG-based summary indexing scales summarization to large document collections, enabling fast retrieval of relevant sections.