BeClaude
Guide2026-04-30

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn how to summarize complex documents using Claude AI. This guide covers prompt engineering, metadata extraction, handling long texts, and evaluation techniques using ROUGE scores and Promptfoo.

Quick Answer

This guide teaches you to summarize documents with Claude, from basic prompts to advanced techniques like guided summarization, meta-summarization, and RAG-based indexing. You'll learn to extract metadata, handle long documents, and evaluate summary quality using ROUGE scores and Promptfoo.

Claude SummarizationPrompt EngineeringLegal Document AnalysisRAGEvaluation Metrics

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

In today's information-dense world, the ability to distill lengthy documents into concise, actionable summaries is invaluable. Whether you're a legal professional parsing contracts, a researcher reviewing papers, or a business analyst synthesizing reports, Claude's summarization capabilities can dramatically reduce your cognitive load.

This guide walks you through practical, battle-tested techniques for summarizing documents with Claude. We'll start with the basics and progressively build toward advanced methods, including handling documents that exceed token limits and implementing Retrieval-Augmented Generation (RAG) for large-scale summarization.

Why Summarization Is Hard (and Why Claude Excels)

Summarization is deceptively difficult. Unlike classification or extraction tasks, there's rarely a single "correct" summary. Quality depends on context, audience, and purpose. A summary for a legal team differs vastly from one for executives.

Claude excels here because:

  • Long context window: Handles documents up to 200K tokens
  • Nuanced understanding: Grasps legal jargon, technical terms, and domain-specific language
  • Controllable output: You can guide the style, length, and focus of summaries

Setting Up Your Environment

Before diving in, let's set up the necessary tools. You'll need:

pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo

And an Anthropic API key:

import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

Data Preparation: Extracting Text from PDFs

Most real-world documents come as PDFs. Here's a robust function to extract and clean text:

import pypdf

def extract_text_from_pdf(pdf_path): with open(pdf_path, 'rb') as file: reader = pypdf.PdfReader(file) text = "" for page in reader.pages: text += page.extract_text() return text

def clean_text(text): # Remove excessive whitespace import re text = re.sub(r'\s+', ' ', text) # Remove non-ASCII characters if needed text = text.encode('ascii', 'ignore').decode() return text.strip()

Example usage

text = extract_text_from_pdf("sublease_agreement.pdf") text = clean_text(text)

For quick testing, you can also paste text directly:

text = """Your document text here..."""

Basic Summarization: Your First Claude Prompt

Let's start with a straightforward summarization call:

def summarize_with_claude(text, max_tokens=500):
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=max_tokens,
        messages=[
            {
                "role": "user",
                "content": f"Please provide a concise summary of the following document:\n\n{text}"
            }
        ]
    )
    return response.content[0].text

summary = summarize_with_claude(text) print(summary)

Important: Notice we're using the user role with a clear instruction. This is the foundation. But for production, you'll want more control.

Multi-Shot Summarization: Providing Examples

Claude performs better when you show it what "good" looks like. This is called few-shot prompting:

def summarize_with_examples(text):
    response = client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=500,
        messages=[
            {
                "role": "user",
                "content": """I will show you a document and an example of a good summary. Then I'll ask you to summarize a new document.

Example Document: [Short example document]

Example Good Summary: [Corresponding summary]

Now summarize this document:

""" + text } ] ) return response.content[0].text

This technique dramatically improves consistency, especially for domain-specific content.

Advanced Techniques: Guided and Domain-Specific Summarization

Guided Summarization

Instead of a generic "summarize this," guide Claude with specific instructions:

def guided_summarize(text, instructions):
    prompt = f"""Summarize the following document according to these instructions:

INSTRUCTIONS: {instructions}

DOCUMENT: {text}

SUMMARY:""" response = client.messages.create( model="claude-3-sonnet-20240229", max_tokens=800, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text

Example: Legal document focus

instructions = """
  • Identify all parties involved
  • List key dates and deadlines
  • Highlight termination clauses
  • Note any financial obligations
  • Keep under 300 words
"""

summary = guided_summarize(text, instructions)

Domain-Specific Guided Summarization

For legal documents, you can add domain knowledge:

def legal_summarize(text):
    prompt = f"""You are a legal document analyst. Summarize this contract with:
  • Parties and their roles
  • Key obligations for each party
  • Termination conditions
  • Liability and indemnification clauses
  • Governing law and jurisdiction
  • Any unusual or high-risk provisions
Document: {text}

Legal Summary:""" response = client.messages.create( model="claude-3-sonnet-20240229", max_tokens=1000, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text

Meta-Summarization: Handling Long Documents

When documents exceed Claude's context window, use a chunk-and-merge strategy:

def chunk_text(text, chunk_size=50000):
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunk = ' '.join(words[i:i+chunk_size])
        chunks.append(chunk)
    return chunks

def meta_summarize(text): chunks = chunk_text(text) # Step 1: Summarize each chunk chunk_summaries = [] for i, chunk in enumerate(chunks): summary = summarize_with_claude(chunk, max_tokens=300) chunk_summaries.append(summary) print(f"Processed chunk {i+1}/{len(chunks)}") # Step 2: Combine chunk summaries combined = "\n\n".join(chunk_summaries) # Step 3: Final summary of summaries final_summary = summarize_with_claude( f"Combine these section summaries into a coherent overall summary:\n\n{combined}", max_tokens=800 ) return final_summary

This hierarchical approach preserves context while staying within token limits.

Summary Indexed Documents: An Advanced RAG Approach

For massive document collections, use a RAG (Retrieval-Augmented Generation) approach where you index summaries:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

class SummaryIndex: def __init__(self): self.documents = [] self.summaries = [] self.vectorizer = TfidfVectorizer() def add_document(self, doc_id, text): summary = summarize_with_claude(text, max_tokens=200) self.documents.append({"id": doc_id, "text": text}) self.summaries.append({"id": doc_id, "summary": summary}) def search(self, query, top_k=3): # Vectorize summaries summary_texts = [s["summary"] for s in self.summaries] vectors = self.vectorizer.fit_transform(summary_texts + [query]) # Find most similar similarities = cosine_similarity(vectors[-1:], vectors[:-1])[0] top_indices = np.argsort(similarities)[-top_k:][::-1] return [self.summaries[i] for i in top_indices] def query(self, question): relevant = self.search(question) context = "\n\n".join([r["summary"] for r in relevant]) response = client.messages.create( model="claude-3-sonnet-20240229", max_tokens=500, messages=[{ "role": "user", "content": f"Based on these document summaries, answer: {question}\n\nContext:\n{context}" }] ) return response.content[0].text

Best Practices for Summarization RAG

  • Chunk strategically: Split documents at natural boundaries (paragraphs, sections)
  • Index summaries, not raw text: Summaries are more searchable
  • Use hybrid search: Combine semantic and keyword-based retrieval
  • Cache summaries: Avoid regenerating summaries for the same documents

Evaluating Summary Quality

Evaluation is critical but challenging. Here's a practical approach:

ROUGE Score Evaluation

from rouge_score import rouge_scorer

def evaluate_summary(reference, generated): scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True) scores = scorer.score(reference, generated) print("ROUGE-1:", scores['rouge1'].fmeasure) print("ROUGE-2:", scores['rouge2'].fmeasure) print("ROUGE-L:", scores['rougeL'].fmeasure) return scores

Custom Evaluation with Promptfoo

Promptfoo allows you to define custom evaluation criteria:
# promptfoo config
prompts:
  - "Summarize this document: {{text}}"
  - "Provide a concise summary focusing on key points: {{text}}"

tests: - vars: text: "Your test document here" assert: - type: contains-all value: ["parties", "obligations", "termination"] - type: max-length value: 500 - type: python value: | # Custom check for factual accuracy def check_facts(output): return "parties" in output.lower()

Iterative Improvement: A Practical Workflow

  • Baseline: Start with a simple prompt
  • Evaluate: Use ROUGE scores and human review
  • Identify gaps: Is it missing key information? Too verbose?
  • Refine prompt: Add instructions, examples, or constraints
  • Re-evaluate: Compare against baseline
  • Repeat: Until quality meets your threshold

Conclusion and Best Practices

  • Start simple, then layer: Begin with basic prompts, then add guidance
  • Use examples: Few-shot prompting dramatically improves consistency
  • Handle long documents: Use chunk-and-merge or RAG approaches
  • Evaluate systematically: Combine automated metrics with human review
  • Iterate: Summarization is rarely perfect on the first try
  • Domain-specific prompts: Tailor instructions to your content type
  • Monitor token usage: Long documents can be expensive; optimize chunk sizes

Key Takeaways

  • Guided prompts outperform generic ones: Specific instructions yield summaries that match your needs
  • Chunk-and-merge handles any document length: Meta-summarization preserves context across large documents
  • RAG with summary indexing scales to document collections: Search summaries, not raw text
  • Evaluation requires multiple metrics: Combine ROUGE scores with custom checks for factual accuracy and completeness
  • Iterative refinement is essential: Treat summarization as an evolving process, not a one-shot task