BeClaude

claude-api-cost-optimization

New
4GitHubGeneralby Louishin

💰 Optimize your Claude API usage to save 50-90% on costs with batching techniques and efficient request management.

Community PluginView Source

Overview

Claude API Cost Optimization

Save 50-90% on Claude API costs with three officially verified techniques

Quick Reference

TechniqueSavingsUse When
Batch API50%Tasks can wait up to 24h
Prompt Caching90%Repeated system prompts (>1K tokens)
Extended Thinking~80%Complex reasoning tasks
Batch + Cache~95%Bulk tasks with shared context

1. Batch API (50% Off)

When to Use

  • Bulk translations
  • Daily content generation
  • Overnight report processing
  • NOT for real-time chat

Code Example

python
import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "task-001",
            "params": {
                "model": "claude-sonnet-4-5",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Task 1"}]
            }
        }
    ]
)

# Results available within 24h (usually <1h)
for result in client.messages.batches.results(batch.id):
    print(f"{result.custom_id}: {result.result.message.content[0].text}")

Key Finding: Bigger Batches = Faster!

Batch SizeTime/Request
Large (294)0.45 min
Small (10)9.84 min

22x efficiency difference! Always batch 100+ requests together.


2. Prompt Caching (90% Off)

When to Use

  • Long system prompts (>1K tokens)
  • Repeated instructions
  • RAG with large context

Code Example

python
response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "Your long system prompt here...",
        "cache_control": {"type": "ephemeral"}  # Enable caching!
    }],
    messages=[{"role": "user", "content": "User question"}]
)
# First call: +25% (cache write)
# Subsequent: -90% (cache read!)

Cache Rules

  • Minimum: 1,024 tokens (Sonnet)
  • TTL: 5 minutes (refreshes on use)

3. Extended Thinking (~80% Off)

When to Use

  • Complex code architecture
  • Strategic planning
  • Mathematical reasoning

Code Example

python
response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{"role": "user", "content": "Design architecture for..."}]
)

Decision Flowchart

code
Can wait 24h? → Yes → Batch API (50% off)
                 ↓ No
Repeated prompts >1K? → Yes → Prompt Caching (90% off)
                         ↓ No
Complex reasoning? → Yes → Extended Thinking
                      ↓ No
Use normal API

Official Docs


Made with 🐾 by [Washin Village](https://washinmura.jp) - Verified against official Anthropic documentation

Install & Usage

1
Create the skills directory
mkdir -p .claude/skills
2
Download the skill file
mkdir -p .claude/skills && curl -o .claude/skills/claude-api-cost-optimization.md https://raw.githubusercontent.com/Louishin/claude-api-cost-optimization/main/SKILL.md
3
Invoke in Claude Code
/claude-api-cost-optimization
View source on GitHub
api

Frequently Asked Questions

What is claude-api-cost-optimization?

💰 Optimize your Claude API usage to save 50-90% on costs with batching techniques and efficient request management.

How to install claude-api-cost-optimization?

To install claude-api-cost-optimization, create the .claude/skills directory in your project, then run the curl command to download the skill file. Once installed, invoke it in Claude Code with /claude-api-cost-optimization.

What is claude-api-cost-optimization best for?

claude-api-cost-optimization is a community categorized under General. It is designed for: api. Created by Louishin.