Claude Batch API: Guide to Cost-Effective Batch Processing
Learn how to use the Claude Batch API to process large volumes of requests at 50% lower cost. Complete guide to creating, monitoring, and retrieving batch jobs with the Anthropic API.
Claude Batch API lets you submit large groups of requests asynchronously at 50% off standard API pricing. Submit a batch of Message API requests, and results are processed within 24 hours. Use it for offline workloads like data classification, content moderation, bulk translation, and large-scale document analysis. No rate limits on batch requests — send as many as you need.
What is the Claude Batch API?
The Claude Batch API is an asynchronous processing endpoint that lets you submit large volumes of requests at 50% lower cost compared to standard API calls. Instead of waiting for each response in real-time, you submit a batch — results are processed and ready for retrieval within 24 hours.
This is ideal for workloads where immediate responses aren't necessary, such as backfill processing, periodic data analysis, content moderation pipelines, and large-scale data enrichment.
How Batch Processing Differs from Standard API
| Aspect | Standard API | Batch API |
|---|---|---|
| Response time | Real-time (seconds) | Up to 24 hours |
| Cost | Full price | 50% discount |
| Rate limits | Standard limits apply | No per-minute limits |
| Use case | Interactive, user-facing | Offline, background processing |
| Concurrent requests | Limited by rate limits | Submit thousands at once |
Getting Started with Batch API
Prerequisites
- An Anthropic API key with batch access enabled
- The
anthropicPython SDK (v0.39+)
Step 1: Prepare Your Batch Requests
Each batch consists of a series of requests, where each request is a standard Message API call:
import anthropic
import json
client = anthropic.Anthropic()
Each request in the batch follows the Message API format
requests = [
{
"custom_id": "req-001",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Classify this review as positive, negative, or neutral: 'The product works well but shipping was slow'"}
]
}
},
{
"custom_id": "req-002",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Classify this review as positive, negative, or neutral: 'Absolutely love it! Best purchase ever'"}
]
}
},
{
"custom_id": "req-003",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Classify this review as positive, negative, or neutral: 'Terrible customer support, never buying again'"}
]
}
},
]
Step 2: Submit the Batch
batch = client.batches.create(
requests=requests
)
print(f"Batch ID: {batch.id}")
print(f"Total requests: {len(requests)}")
Step 3: Monitor Progress
import time
def wait_for_batch(batch_id: str, poll_interval: int = 60):
while True:
batch_status = client.batches.retrieve(batch_id)
status = batch_status.processing_status
print(f"Status: {status.processing_status}")
print(f"Progress: {status.succeeded_count}/{status.total_count}")
if status.processing_status in ["ended", "succeeded"]:
return batch_status
time.sleep(poll_interval)
batch_result = wait_for_batch(batch.id)
Step 4: Retrieve Results
results = client.batches.results(batch.id)
for result in results:
custom_id = result.custom_id
message = result.result.message if result.result.type == "succeeded" else None
if message:
content = message.content[0].text
print(f"{custom_id}: {content}")
else:
print(f"{custom_id}: Failed - {result.result.error}")
Cost Comparison
Here's what 50% savings look like in practice:
| Volume | Standard API (Sonnet 4) | Batch API (Sonnet 4) | Savings |
|---|---|---|---|
| 10,000 requests | ~$30 | ~$15 | $15 |
| 100,000 requests | ~$300 | ~$150 | $150 |
| 1,000,000 requests | ~$3,000 | ~$1,500 | $1,500 |
Best Use Cases for Batch API
1. Large-Scale Content Moderation
Process millions of comments or posts for policy violations overnight:# Submit 50,000 content items for moderation in a single batch
moderation_requests = []
for i, content in enumerate(content_items):
moderation_requests.append({
"custom_id": f"mod-{i:06d}",
"params": {
"model": "claude-haiku-4-20250514",
"max_tokens": 128,
"messages": [{
"role": "user",
"content": f"Flag this content if it violates policies:\n{content}"
}]
}
})
batch = client.batches.create(requests=moderation_requests)
2. Data Classification and Enrichment
Enrich your database with AI-generated metadata, categories, and insights.3. Bulk Translation
Translate large document collections or datasets into multiple languages.4. Document Analysis Pipeline
Process thousands of PDFs, invoices, or reports for data extraction:Use Claude Vision to extract data from document images — see our
Claude Vision API Guide
for document processing patterns.
5. Model Evaluation and Testing
Run your entire test suite against Claude to evaluate output quality and consistency.Batch Processing Strategies
Choosing the Right Model
| Workload | Recommended Model | Rationale |
|---|---|---|
| Simple classification | Haiku | Fastest, cheapest ($0.125/M tokens batch) |
| Content moderation | Haiku or Sonnet 4 | Balance of speed and accuracy |
| Data extraction | Sonnet 4 | Strong structured output |
| Complex analysis | Opus 4.6 | Best reasoning ($7.5/M tokens batch) |
Handling Large Batches
For very large workloads:
# Split into multiple batches of 10,000 requests each
BATCH_SIZE = 10000
batches = []
for i in range(0, len(all_requests), BATCH_SIZE):
chunk = all_requests[i:i + BATCH_SIZE]
batch = client.batches.create(requests=chunk)
batches.append(batch.id)
print(f"Created batch {batch.id} with {len(chunk)} requests")
Monitor all batches
while batches:
completed = []
for batch_id in batches:
status = client.batches.retrieve(batch_id)
if status.processing_status in ["ended", "succeeded"]:
completed.append(batch_id)
for batch_id in completed:
batches.remove(batch_id)
if batches:
print(f"Waiting for {len(batches)} batches...")
time.sleep(120)
Error Handling and Retries
def process_batch_results(batch_id: str):
results = client.batches.results(batch_id)
failed_requests = []
for result in results:
if result.result.type != "succeeded":
failed_requests.append({
"custom_id": result.custom_id,
"error": result.result.error
})
# Retry failed requests in a new batch
if failed_requests:
original_requests = [...] # Original request params
retry_params = [
req for req in original_requests
if any(f["custom_id"] == req["custom_id"] for f in failed_requests)
]
retry_batch = client.batches.create(requests=retry_params)
print(f"Retrying {len(retry_params)} failed requests in batch {retry_batch.id}")
Limitations and Considerations
Processing Time
- Most batches complete within 24 hours
- Smaller batches (< 1,000 requests) often complete in 1-6 hours
- Processing time depends on total request volume and model complexity
Request Constraints
- Each request in a batch uses the same standard Message API parameters
- Maximum tokens per request still apply (same as standard API)
- Haiku is particularly cost-effective for batch processing at $0.125/M input tokens
When NOT to Use Batch API
- User-facing applications requiring real-time responses
- Interactive chat experiences
- Time-sensitive operations
- When you need immediate error feedback during development
Key Takeaways
- 50% cost savings — Batch API is half the price of standard API calls
- No rate limits — Submit thousands or millions of requests at once
- 24-hour SLA — Results are available within 24 hours, often faster
- Haiku for high volume — At batch pricing, Haiku costs just $0.125/M input tokens
- Idempotent design — Design batch jobs to handle retries gracefully