BeClaude
GuideIntermediateAPI2026-05-15

Claude Batch API: Guide to Cost-Effective Batch Processing

Learn how to use the Claude Batch API to process large volumes of requests at 50% lower cost. Complete guide to creating, monitoring, and retrieving batch jobs with the Anthropic API.

Quick Answer

Claude Batch API lets you submit large groups of requests asynchronously at 50% off standard API pricing. Submit a batch of Message API requests, and results are processed within 24 hours. Use it for offline workloads like data classification, content moderation, bulk translation, and large-scale document analysis. No rate limits on batch requests — send as many as you need.

batch-apipricingcost-savingsapiprocessing

What is the Claude Batch API?

The Claude Batch API is an asynchronous processing endpoint that lets you submit large volumes of requests at 50% lower cost compared to standard API calls. Instead of waiting for each response in real-time, you submit a batch — results are processed and ready for retrieval within 24 hours.

This is ideal for workloads where immediate responses aren't necessary, such as backfill processing, periodic data analysis, content moderation pipelines, and large-scale data enrichment.

How Batch Processing Differs from Standard API

AspectStandard APIBatch API
Response timeReal-time (seconds)Up to 24 hours
CostFull price50% discount
Rate limitsStandard limits applyNo per-minute limits
Use caseInteractive, user-facingOffline, background processing
Concurrent requestsLimited by rate limitsSubmit thousands at once

Getting Started with Batch API

Prerequisites

Step 1: Prepare Your Batch Requests

Each batch consists of a series of requests, where each request is a standard Message API call:

import anthropic
import json

client = anthropic.Anthropic()

Each request in the batch follows the Message API format

requests = [ { "custom_id": "req-001", "params": { "model": "claude-sonnet-4-20250514", "max_tokens": 256, "messages": [ {"role": "user", "content": "Classify this review as positive, negative, or neutral: 'The product works well but shipping was slow'"} ] } }, { "custom_id": "req-002", "params": { "model": "claude-sonnet-4-20250514", "max_tokens": 256, "messages": [ {"role": "user", "content": "Classify this review as positive, negative, or neutral: 'Absolutely love it! Best purchase ever'"} ] } }, { "custom_id": "req-003", "params": { "model": "claude-sonnet-4-20250514", "max_tokens": 256, "messages": [ {"role": "user", "content": "Classify this review as positive, negative, or neutral: 'Terrible customer support, never buying again'"} ] } }, ]

Step 2: Submit the Batch

batch = client.batches.create(
    requests=requests
)

print(f"Batch ID: {batch.id}") print(f"Total requests: {len(requests)}")

Step 3: Monitor Progress

import time

def wait_for_batch(batch_id: str, poll_interval: int = 60): while True: batch_status = client.batches.retrieve(batch_id) status = batch_status.processing_status print(f"Status: {status.processing_status}") print(f"Progress: {status.succeeded_count}/{status.total_count}") if status.processing_status in ["ended", "succeeded"]: return batch_status time.sleep(poll_interval)

batch_result = wait_for_batch(batch.id)

Step 4: Retrieve Results

results = client.batches.results(batch.id)

for result in results: custom_id = result.custom_id message = result.result.message if result.result.type == "succeeded" else None if message: content = message.content[0].text print(f"{custom_id}: {content}") else: print(f"{custom_id}: Failed - {result.result.error}")

Cost Comparison

Here's what 50% savings look like in practice:

VolumeStandard API (Sonnet 4)Batch API (Sonnet 4)Savings
10,000 requests~$30~$15$15
100,000 requests~$300~$150$150
1,000,000 requests~$3,000~$1,500$1,500
Calculate your exact costs with our Pricing Calculator and see the Claude API Pricing Guide for detailed model rates.

Best Use Cases for Batch API

1. Large-Scale Content Moderation

Process millions of comments or posts for policy violations overnight:
# Submit 50,000 content items for moderation in a single batch
moderation_requests = []
for i, content in enumerate(content_items):
    moderation_requests.append({
        "custom_id": f"mod-{i:06d}",
        "params": {
            "model": "claude-haiku-4-20250514",
            "max_tokens": 128,
            "messages": [{
                "role": "user",
                "content": f"Flag this content if it violates policies:\n{content}"
            }]
        }
    })

batch = client.batches.create(requests=moderation_requests)

2. Data Classification and Enrichment

Enrich your database with AI-generated metadata, categories, and insights.

3. Bulk Translation

Translate large document collections or datasets into multiple languages.

4. Document Analysis Pipeline

Process thousands of PDFs, invoices, or reports for data extraction:
Use Claude Vision to extract data from document images — see our
Claude Vision API Guide
for document processing patterns.

5. Model Evaluation and Testing

Run your entire test suite against Claude to evaluate output quality and consistency.

Batch Processing Strategies

Choosing the Right Model

WorkloadRecommended ModelRationale
Simple classificationHaikuFastest, cheapest ($0.125/M tokens batch)
Content moderationHaiku or Sonnet 4Balance of speed and accuracy
Data extractionSonnet 4Strong structured output
Complex analysisOpus 4.6Best reasoning ($7.5/M tokens batch)

Handling Large Batches

For very large workloads:

# Split into multiple batches of 10,000 requests each
BATCH_SIZE = 10000

batches = [] for i in range(0, len(all_requests), BATCH_SIZE): chunk = all_requests[i:i + BATCH_SIZE] batch = client.batches.create(requests=chunk) batches.append(batch.id) print(f"Created batch {batch.id} with {len(chunk)} requests")

Monitor all batches

while batches: completed = [] for batch_id in batches: status = client.batches.retrieve(batch_id) if status.processing_status in ["ended", "succeeded"]: completed.append(batch_id) for batch_id in completed: batches.remove(batch_id) if batches: print(f"Waiting for {len(batches)} batches...") time.sleep(120)

Error Handling and Retries

def process_batch_results(batch_id: str):
    results = client.batches.results(batch_id)
    failed_requests = []
    
    for result in results:
        if result.result.type != "succeeded":
            failed_requests.append({
                "custom_id": result.custom_id,
                "error": result.result.error
            })
    
    # Retry failed requests in a new batch
    if failed_requests:
        original_requests = [...]  # Original request params
        retry_params = [
            req for req in original_requests
            if any(f["custom_id"] == req["custom_id"] for f in failed_requests)
        ]
        retry_batch = client.batches.create(requests=retry_params)
        print(f"Retrying {len(retry_params)} failed requests in batch {retry_batch.id}")

Limitations and Considerations

Processing Time

  • Most batches complete within 24 hours
  • Smaller batches (< 1,000 requests) often complete in 1-6 hours
  • Processing time depends on total request volume and model complexity

Request Constraints

  • Each request in a batch uses the same standard Message API parameters
  • Maximum tokens per request still apply (same as standard API)
  • Haiku is particularly cost-effective for batch processing at $0.125/M input tokens

When NOT to Use Batch API

  • User-facing applications requiring real-time responses
  • Interactive chat experiences
  • Time-sensitive operations
  • When you need immediate error feedback during development

Key Takeaways

  • 50% cost savings — Batch API is half the price of standard API calls
  • No rate limits — Submit thousands or millions of requests at once
  • 24-hour SLA — Results are available within 24 hours, often faster
  • Haiku for high volume — At batch pricing, Haiku costs just $0.125/M input tokens
  • Idempotent design — Design batch jobs to handle retries gracefully
For more details on API pricing, see our Claude API Pricing Guide. Compare model capabilities on the Model Comparison page to choose the right model for your batch workload.