GuideIntermediateAPI2026-05-15

Claude Batch API: Guide to Cost-Effective Batch Processing

Learn how to use the Claude Batch API to process large volumes of requests at 50% lower cost. Complete guide to creating, monitoring, and retrieving batch jobs with the Anthropic API.

Quick Answer

Claude Batch API lets you submit large groups of requests asynchronously at 50% off standard API pricing. Submit a batch of Message API requests, and results are processed within 24 hours. Use it for offline workloads like data classification, content moderation, bulk translation, and large-scale document analysis. No rate limits on batch requests — send as many as you need.

batch-apipricingcost-savingsapiprocessing

What is the Claude Batch API?

The Claude Batch API is an asynchronous processing endpoint that lets you submit large volumes of requests at 50% lower cost compared to standard API calls. Instead of waiting for each response in real-time, you submit a batch — results are processed and ready for retrieval within 24 hours.

This is ideal for workloads where immediate responses aren't necessary, such as backfill processing, periodic data analysis, content moderation pipelines, and large-scale data enrichment.

How Batch Processing Differs from Standard API

Aspect	Standard API	Batch API
Response time	Real-time (seconds)	Up to 24 hours
Cost	Full price	50% discount
Rate limits	Standard limits apply	No per-minute limits
Use case	Interactive, user-facing	Offline, background processing
Concurrent requests	Limited by rate limits	Submit thousands at once

Getting Started with Batch API

Prerequisites

An Anthropic API key with batch access enabled
The anthropic Python SDK (v0.39+)

Step 1: Prepare Your Batch Requests

Each batch consists of a series of requests, where each request is a standard Message API call:

import anthropic
import json
client = anthropic.Anthropic()
Each request in the batch follows the Message API format
requests = [
    {
        "custom_id": "req-001",
        "params": {
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 256,
            "messages": [
                {"role": "user", "content": "Classify this review as positive, negative, or neutral: 'The product works well but shipping was slow'"}
            ]
        }
    },
    {
        "custom_id": "req-002",
        "params": {
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 256,
            "messages": [
                {"role": "user", "content": "Classify this review as positive, negative, or neutral: 'Absolutely love it! Best purchase ever'"}
            ]
        }
    },
    {
        "custom_id": "req-003",
        "params": {
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 256,
            "messages": [
                {"role": "user", "content": "Classify this review as positive, negative, or neutral: 'Terrible customer support, never buying again'"}
            ]
        }
    },
]

Step 2: Submit the Batch

batch = client.batches.create(
    requests=requests
)
print(f"Batch ID: {batch.id}")
print(f"Total requests: {len(requests)}")

Step 3: Monitor Progress

import time
def wait_for_batch(batch_id: str, poll_interval: int = 60):
    while True:
        batch_status = client.batches.retrieve(batch_id)
        status = batch_status.processing_status
        
        print(f"Status: {status.processing_status}")
        print(f"Progress: {status.succeeded_count}/{status.total_count}")
        
        if status.processing_status in ["ended", "succeeded"]:
            return batch_status
            
        time.sleep(poll_interval)
batch_result = wait_for_batch(batch.id)

Step 4: Retrieve Results

results = client.batches.results(batch.id)
for result in results:
    custom_id = result.custom_id
    message = result.result.message if result.result.type == "succeeded" else None
    
    if message:
        content = message.content[0].text
        print(f"{custom_id}: {content}")
    else:
        print(f"{custom_id}: Failed - {result.result.error}")

Cost Comparison

Here's what 50% savings look like in practice:

Volume	Standard API (Sonnet 4)	Batch API (Sonnet 4)	Savings
10,000 requests	~$30	~$15	$15
100,000 requests	~$300	~$150	$150
1,000,000 requests	~$3,000	~$1,500	$1,500

Calculate your exact costs with our Pricing Calculator and see the Claude API Pricing Guide for detailed model rates.

Best Use Cases for Batch API

1. Large-Scale Content Moderation

Process millions of comments or posts for policy violations overnight:

# Submit 50,000 content items for moderation in a single batch
moderation_requests = []
for i, content in enumerate(content_items):
    moderation_requests.append({
        "custom_id": f"mod-{i:06d}",
        "params": {
            "model": "claude-haiku-4-20250514",
            "max_tokens": 128,
            "messages": [{
                "role": "user",
                "content": f"Flag this content if it violates policies:\n{content}"
            }]
        }
    })
batch = client.batches.create(requests=moderation_requests)

2. Data Classification and Enrichment

Enrich your database with AI-generated metadata, categories, and insights.

3. Bulk Translation

Translate large document collections or datasets into multiple languages.

4. Document Analysis Pipeline

Process thousands of PDFs, invoices, or reports for data extraction:

Use Claude Vision to extract data from document images — see our
Claude Vision API Guide
for document processing patterns.

5. Model Evaluation and Testing

Run your entire test suite against Claude to evaluate output quality and consistency.

Batch Processing Strategies

Choosing the Right Model

Workload	Recommended Model	Rationale
Simple classification	Haiku	Fastest, cheapest ($0.125/M tokens batch)
Content moderation	Haiku or Sonnet 4	Balance of speed and accuracy
Data extraction	Sonnet 4	Strong structured output
Complex analysis	Opus 4.6	Best reasoning ($7.5/M tokens batch)

Handling Large Batches

For very large workloads:

# Split into multiple batches of 10,000 requests each
BATCH_SIZE = 10000
batches = []
for i in range(0, len(all_requests), BATCH_SIZE):
    chunk = all_requests[i:i + BATCH_SIZE]
    batch = client.batches.create(requests=chunk)
    batches.append(batch.id)
    print(f"Created batch {batch.id} with {len(chunk)} requests")
Monitor all batches
while batches:
    completed = []
    for batch_id in batches:
        status = client.batches.retrieve(batch_id)
        if status.processing_status in ["ended", "succeeded"]:
            completed.append(batch_id)
    
    for batch_id in completed:
        batches.remove(batch_id)
    
    if batches:
        print(f"Waiting for {len(batches)} batches...")
        time.sleep(120)

Error Handling and Retries

def process_batch_results(batch_id: str):
    results = client.batches.results(batch_id)
    failed_requests = []
    
    for result in results:
        if result.result.type != "succeeded":
            failed_requests.append({
                "custom_id": result.custom_id,
                "error": result.result.error
            })
    
    # Retry failed requests in a new batch
    if failed_requests:
        original_requests = [...]  # Original request params
        retry_params = [
            req for req in original_requests
            if any(f["custom_id"] == req["custom_id"] for f in failed_requests)
        ]
        retry_batch = client.batches.create(requests=retry_params)
        print(f"Retrying {len(retry_params)} failed requests in batch {retry_batch.id}")

Limitations and Considerations

Processing Time

Most batches complete within 24 hours
Smaller batches (< 1,000 requests) often complete in 1-6 hours
Processing time depends on total request volume and model complexity

Request Constraints

Each request in a batch uses the same standard Message API parameters
Maximum tokens per request still apply (same as standard API)
Haiku is particularly cost-effective for batch processing at $0.125/M input tokens

When NOT to Use Batch API

User-facing applications requiring real-time responses
Interactive chat experiences
Time-sensitive operations
When you need immediate error feedback during development

Key Takeaways

50% cost savings — Batch API is half the price of standard API calls
No rate limits — Submit thousands or millions of requests at once
24-hour SLA — Results are available within 24 hours, often faster
Haiku for high volume — At batch pricing, Haiku costs just $0.125/M input tokens
Idempotent design — Design batch jobs to handle retries gracefully

For more details on API pricing, see our Claude API Pricing Guide. Compare model capabilities on the Model Comparison page to choose the right model for your batch workload.