BeClaude
Guide2026-05-06

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Learn practical techniques for summarizing long documents with Claude AI, including prompt engineering, metadata extraction, handling token limits, and evaluating summary quality using ROUGE scores and Promptfoo.

Quick Answer

Learn how to use Claude for document summarization, from basic prompts to advanced techniques like guided summarization, meta-summarization, and RAG-based indexing. Includes practical code examples and evaluation methods.

Claude SummarizationPrompt EngineeringRAGLegal Document AnalysisEvaluation Metrics

Mastering Document Summarization with Claude: From Basic Prompts to Advanced RAG

Summarization is one of the most powerful and practical applications of large language models. Whether you're a legal professional drowning in contracts, a researcher sifting through papers, or a product manager parsing customer feedback, the ability to condense lengthy documents into actionable insights saves time and improves decision-making.

Claude excels at summarization tasks thanks to its large context window, nuanced understanding of language, and ability to follow complex instructions. In this guide, you'll learn practical, actionable techniques for summarizing documents with Claude—from simple prompts to advanced retrieval-augmented generation (RAG) approaches.

We'll focus on real-world challenges: handling long documents, extracting structured metadata, evaluating summary quality, and iteratively improving your results. Code examples are provided in Python, but the concepts apply to any programming language.

Why Summarization Is Hard (and Why Claude Helps)

Evaluating summary quality is notoriously subjective. What one reader considers a perfect summary, another may find lacking. Traditional metrics like ROUGE scores measure word overlap with reference summaries but miss nuances like coherence, factual accuracy, and relevance to the reader's goals.

Claude helps overcome these challenges by:

  • Understanding context and intent beyond simple keyword matching
  • Following detailed instructions about summary structure and focus
  • Handling documents up to 100K+ tokens in a single call
  • Generating summaries in specific formats (bullet points, structured data, executive briefs)

Setup and Data Preparation

First, install the required packages:

pip install anthropic pypdf pandas matplotlib sklearn numpy rouge-score nltk seaborn promptfoo

You'll also need a Claude API key. Set it as an environment variable:

export ANTHROPIC_API_KEY="your-api-key-here"

Extracting Text from Documents

For this guide, we'll use a publicly available Sublease Agreement from the SEC website. Here's how to extract text from a PDF:

import pypdf

def extract_text_from_pdf(pdf_path): with open(pdf_path, 'rb') as file: reader = pypdf.PdfReader(file) text = "" for page in reader.pages: text += page.extract_text() return text

Load your document

text = extract_text_from_pdf("sublease_agreement.pdf")

If you're working with plain text, simply define text = "your content here".

Basic Summarization with Claude

Let's start with a simple summarization function. This foundational approach uses Claude's instruction-following capabilities effectively:

import anthropic

client = anthropic.Anthropic()

def summarize_document(text, max_tokens=1000): response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=max_tokens, system="You are an expert document summarizer. Create concise, accurate summaries that capture key information.", messages=[ { "role": "user", "content": f"Please summarize the following document:\n\n{text}" } ] ) return response.content[0].text

summary = summarize_document(text) print(summary)

This works well for short documents, but for longer texts you'll need more sophisticated approaches.

Advanced Techniques for Better Summaries

1. Guided Summarization

Instead of a generic "summarize this" prompt, guide Claude with specific instructions about what to extract:

def guided_summarize(text, focus_areas):
    prompt = f"""Please analyze the following legal document and provide:
  • A one-paragraph executive summary
  • Key parties involved and their roles
  • Important dates and deadlines
  • Financial terms and obligations
  • Termination conditions
Focus particularly on: {', '.join(focus_areas)}

Document: {text} """ response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1500, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text

Example usage

focus = ["indemnification clauses", "payment terms", "termination rights"] summary = guided_summarize(text, focus)

2. Domain-Specific Guided Summarization

For legal documents, you can create specialized prompts that extract structured metadata:

def legal_document_summary(text):
    prompt = f"""You are a legal document analyst. Extract the following information from this contract:
  • Document Type (e.g., Sublease, NDA, Service Agreement)
  • Effective Date
  • Parties Involved
  • Key Obligations (list up to 5)
  • Financial Terms (amounts, payment schedule, late fees)
  • Termination Clause Summary
  • Governing Law
  • Any Unusual or High-Risk Clauses
Format your response as structured markdown with clear headings.

Document: {text} """ response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=2000, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text

3. Meta-Summarization for Long Documents

When documents exceed Claude's context window, use a chunk-and-summarize approach:

def chunk_text(text, chunk_size=50000):
    """Split text into overlapping chunks."""
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - 5000):  # 5000 word overlap
        chunk = ' '.join(words[i:i + chunk_size])
        chunks.append(chunk)
    return chunks

def meta_summarize(text): # Step 1: Summarize each chunk chunks = chunk_text(text) chunk_summaries = [] for i, chunk in enumerate(chunks): summary = summarize_document(chunk, max_tokens=500) chunk_summaries.append(summary) print(f"Processed chunk {i+1}/{len(chunks)}") # Step 2: Summarize the summaries combined = "\n\n---\n\n".join(chunk_summaries) final_summary = summarize_document( f"Combine these section summaries into a coherent overall summary:\n\n{combined}", max_tokens=1000 ) return final_summary

final = meta_summarize(text)

Summary-Indexed RAG: An Advanced Approach

For large document collections, combine summarization with retrieval-augmented generation (RAG):

import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

class SummaryIndex: def __init__(self, documents): self.documents = documents self.summaries = [] self.vectorizer = TfidfVectorizer(stop_words='english') def build_index(self): # Generate summaries for each document for doc in self.documents: summary = summarize_document(doc, max_tokens=200) self.summaries.append(summary) # Create TF-IDF matrix from summaries self.tfidf_matrix = self.vectorizer.fit_transform(self.summaries) def query(self, question, top_k=3): # Vectorize the question question_vec = self.vectorizer.transform([question]) # Find most relevant summaries similarities = cosine_similarity(question_vec, self.tfidf_matrix)[0] top_indices = np.argsort(similarities)[-top_k:][::-1] # Return relevant documents results = [] for idx in top_indices: results.append({ 'summary': self.summaries[idx], 'document': self.documents[idx], 'relevance': similarities[idx] }) return results

Usage

index = SummaryIndex([doc1, doc2, doc3]) index.build_index() results = index.query("What are the termination conditions?")

Best Practices for Summarization RAG

  • Summarize before indexing: Store summaries alongside full documents for faster retrieval
  • Use hierarchical summaries: Create section-level summaries for very long documents
  • Include metadata: Tag summaries with document type, date, and key entities
  • Update incrementally: Re-summarize only changed documents, not the entire corpus

Evaluating Summary Quality

Automated evaluation helps you iterate and improve. Here's how to use ROUGE scores:

from rouge_score import rouge_scorer

def evaluate_summary(reference, generated): scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True) scores = scorer.score(reference, generated) print("ROUGE-1:", scores['rouge1'].fmeasure) print("ROUGE-2:", scores['rouge2'].fmeasure) print("ROUGE-L:", scores['rougeL'].fmeasure) return scores

For more nuanced evaluation, use Promptfoo to create custom test cases:

# Install Promptfoo
npx promptfoo@latest init

Create evaluation config

cat > promptfooconfig.yaml << EOF evaluation: prompts: - "Summarize this document: {{document}}" providers: - id: anthropic:claude-3-5-sonnet-20241022 tests: - vars: document: "path/to/test_doc.txt" assert: - type: contains-any value: ["key term 1", "key term 2"] - type: latency threshold: 5000 EOF

Run evaluation

npx promptfoo@latest eval

Iterative Improvement Strategy

Follow this cycle to refine your summarization:

  • Baseline: Start with a simple prompt and evaluate
  • Analyze failures: Where does the summary miss important details?
  • Refine prompts: Add specific instructions, examples, or constraints
  • Test edge cases: Try different document types and lengths
  • Automate evaluation: Use ROUGE + custom checks in CI/CD

Conclusion and Best Practices

Summarization with Claude is both powerful and flexible. Here are key takeaways:

  • Be specific in your prompts: Tell Claude exactly what you want (structure, length, focus areas)
  • Use guided summarization for structured output: Extract metadata alongside narrative summaries
  • Chunk long documents strategically: Overlap chunks to maintain context
  • Evaluate both automatically and manually: ROUGE scores catch some issues, but human review catches nuance
  • Iterate based on your use case: The perfect summary for a legal team differs from one for executives

Key Takeaways

  • Prompt specificity matters: Generic "summarize this" prompts produce generic results. Guide Claude with detailed instructions about structure, focus areas, and output format.
  • Handle long documents with chunking and meta-summarization: Break documents into overlapping chunks, summarize each, then summarize the summaries for coherent long-document understanding.
  • Combine summarization with RAG for scalable knowledge retrieval: Index document summaries for fast, relevant retrieval across large document collections.
  • Evaluate summaries using multiple methods: Use ROUGE scores for automated checks and Promptfoo for custom assertions, but always supplement with human review for nuanced quality.
  • Iterate systematically: Start simple, identify failure modes, refine prompts, and automate evaluation to continuously improve your summarization pipeline.