GuideBeginnerBest Practices2026-05-12

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn how to summarize long documents using Claude API. Covers prompt engineering, metadata extraction, handling token limits, ROUGE evaluation, and iterative improvement.

Quick Answer

This guide teaches you to summarize long documents with Claude, including basic prompts, guided summarization, meta-summarization for token limits, and evaluation using ROUGE scores and Promptfoo.

summarizationprompt engineeringRAGevaluationlegal documents

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Summarization is one of the most practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher scanning dozens of papers, or a product manager trying to distill customer feedback, Claude can help you extract the signal from the noise.

This guide walks you through a complete summarization workflow using the Claude API. We'll start with a simple prompt, then layer in advanced techniques like guided summarization, meta-summarization for long documents, and a summary-indexed RAG approach. Along the way, we'll cover how to evaluate summary quality using both automated metrics (ROUGE) and custom evaluation frameworks like Promptfoo.

Why Summarization Is Hard (and Why Claude Excels)

Summarization evaluation is notoriously subjective. Two human readers can disagree on what constitutes a "good" summary. Traditional metrics like ROUGE measure n-gram overlap but miss coherence, factual accuracy, and relevance. Claude's strength lies in its ability to follow nuanced instructions, maintain context over long passages, and generate summaries that are both concise and faithful to the source.

Setup: Installing Dependencies

Before you start, install the required packages:

pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo

You'll also need a valid Anthropic API key. Set it as an environment variable:

export ANTHROPIC_API_KEY="sk-ant-..."

Data Preparation: Extracting Text from PDFs

For this guide, we'll use a publicly available Sublease Agreement from the SEC's EDGAR system. If you have your own PDF, you can adapt the file path.

import pypdf
def extract_text_from_pdf(pdf_path: str) -> str:
    reader = pypdf.PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text
text = extract_text_from_pdf("sublease_agreement.pdf")

If you prefer to work with a plain text blob, simply assign text = "...".

Basic Summarization: Your First Prompt

Let's start with a simple summarization function. Even this basic approach uses important Claude features like the assistant role and stop sequences.

import anthropic
client = anthropic.Anthropic()
def summarize_basic(text: str) -> str:
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        system="You are an expert summarizer. Provide a concise summary of the following document.",
        messages=[
            {"role": "user", "content": f"Please summarize this document:\n\n{text}"}
        ]
    )
    return response.content[0].text

This works, but it's naive. The summary might miss key details or include irrelevant information. Let's improve it.

Multi-Shot Basic Summarization

Instead of a single prompt, you can use a multi-shot approach where you ask Claude to first identify key sections, then summarize each, and finally produce a consolidated summary.

def summarize_multishot(text: str) -> str:
    # Step 1: Identify key sections
    sections_response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=512,
        messages=[
            {"role": "user", "content": f"List the main sections of this document:\n\n{text}"}
        ]
    )
    sections = sections_response.content[0].text
# Step 2: Summarize each section
    summary_response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": f"Based on these sections:\n{sections}\n\nProvide a concise summary of the entire document."}
        ]
    )
    return summary_response.content[0].text

Advanced Techniques

Guided Summarization

Instead of a generic summary, guide Claude to extract specific information. This is especially useful for legal documents where you need to capture parties, dates, obligations, and termination clauses.

def guided_summarize(text: str) -> dict:
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": f"""Extract the following metadata from this legal document. Return as JSON:
parties_involved: list of all named parties
effective_date: date the agreement takes effect
termination_conditions: how the agreement can be terminated
key_obligations: list of major obligations for each party
governing_law: which jurisdiction's law applies

Document:\n{text}"""}
        ]
    )
    return response.content[0].text

Domain-Specific Guided Summarization

For legal documents, you can add domain-specific instructions. For example, ask Claude to highlight unusual clauses, indemnification terms, or non-compete restrictions.

def legal_summarize(text: str) -> str:
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": f"""You are a legal document analyst. Summarize this agreement with focus on:
Risk factors (unusual clauses, penalties, auto-renewal)
Financial terms (rent, fees, deposits)
Termination rights
Dispute resolution

Document:\n{text}"""}
        ]
    )
    return response.content[0].text

Meta-Summarization: Handling Long Documents

Claude has a large context window (200K tokens), but some documents—like multi-year contracts or regulatory filings—can still exceed that. The solution is meta-summarization: chunk the document, summarize each chunk, then summarize the summaries.

def chunk_text(text: str, chunk_size: int = 50000) -> list:
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunk = " ".join(words[i:i+chunk_size])
        chunks.append(chunk)
    return chunks
def meta_summarize(text: str) -> str:
    chunks = chunk_text(text)
    chunk_summaries = []
    for chunk in chunks:
        summary = summarize_basic(chunk)
        chunk_summaries.append(summary)
    
    combined = "\n\n".join(chunk_summaries)
    final_summary = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": f"Combine these section summaries into one coherent executive summary:\n\n{combined}"}
        ]
    )
    return final_summary.content[0].text

Summary-Indexed Documents: An Advanced RAG Approach

For very large document collections, you can build a summary-indexed RAG system. Instead of indexing raw chunks, you index summaries of each document. When a user asks a question, you retrieve the most relevant document summaries and then use Claude to answer based on the full text of those documents.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
def build_summary_index(documents: list) -> dict:
    """Build a dictionary mapping document IDs to their summaries."""
    index = {}
    for doc_id, doc_text in enumerate(documents):
        summary = summarize_basic(doc_text)
        index[doc_id] = {
            "summary": summary,
            "full_text": doc_text
        }
    return index
def retrieve_relevant_documents(query: str, index: dict, top_k: int = 3) -> list:
    summaries = [entry["summary"] for entry in index.values()]
    vectorizer = TfidfVectorizer().fit(summaries + [query])
    query_vec = vectorizer.transform([query])
    summary_vecs = vectorizer.transform(summaries)
    similarities = cosine_similarity(query_vec, summary_vecs).flatten()
    top_indices = similarities.argsort()[-top_k:][::-1]
    return [list(index.values())[i]["full_text"] for i in top_indices]

Best Practices for Summarization RAG

Summary granularity: Summarize at the document level, not the chunk level, for better retrieval relevance.
Hybrid search: Combine summary similarity with keyword matching for robust retrieval.
Re-ranking: After retrieval, use Claude to re-rank results based on the specific query.

Evaluations: Measuring Summary Quality

ROUGE Scores

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measures n-gram overlap between the generated summary and a reference summary.

from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
reference = "The sublease agreement transfers rights from Party A to Party B..."
generated = "Party A subleases property to Party B..."
scores = scorer.score(reference, generated)
print(scores)

Custom Evaluation with Promptfoo

Promptfoo allows you to define custom evaluation criteria. For example, you can check that the summary includes all named parties, mentions the effective date, and does not hallucinate terms.

# promptfoo config
evaluators:
  - name: "contains-parties"
    type: "regex"
    pattern: "Party A|Party B"
  - name: "no-hallucination"
    type: "llm-judge"
    prompt: "Does this summary contain any information not present in the original document?"

Iterative Improvement

Summarization is rarely perfect on the first try. Use this feedback loop:

Generate a summary using your current prompt.
Evaluate using ROUGE, regex checks, or a human reviewer.
Identify gaps: Is the summary missing key terms? Too verbose? Hallucinating?
Refine the prompt: Add instructions to fix specific issues (e.g., "Always include the effective date" or "Use bullet points for obligations").
Repeat until quality meets your threshold.

Conclusion and Best Practices

Start simple, then iterate: A basic prompt often works well. Add complexity only when needed.
Use guided summarization for structured output: JSON extraction makes downstream processing easier.
Chunk and meta-summarize for long documents: Don't rely on the context window alone.
Evaluate with multiple methods: Combine ROUGE with custom checks and human review.
Tailor to your domain: Legal, medical, and technical documents each benefit from domain-specific instructions.

Key Takeaways

Guided prompts outperform generic ones: Specify exactly what information you need (parties, dates, obligations) for better results.
Meta-summarization handles any document length: Chunk, summarize, then summarize the summaries.
Summary-indexed RAG improves retrieval: Index document summaries, not raw chunks, for faster and more relevant search.
Evaluate with ROUGE and custom checks: Automated metrics catch n-gram overlap; custom checks catch hallucinations and missing details.
Iterate on your prompts: Small tweaks—like adding "use bullet points" or "include all named entities"—can dramatically improve quality.