Mastering Claude’s Effort Parameter: Balance Speed, Cost, and Reasoning Depth
Learn how to use the effort parameter in the Claude API to control token spend, response thoroughness, and latency. Includes code examples, effort levels, and best practices for Sonnet 4.6 and Opus 4.7.
This guide explains Claude’s effort parameter, which lets you dial token spending from low (fast, cheap) to max (deepest reasoning). You’ll learn how to set effort levels, combine with adaptive thinking, and choose the right level for your use case.
Introduction
When building applications with Claude, you often face a trade-off: thoroughness vs. speed and cost. Do you want Claude to think deeply and produce the most accurate answer, or do you need a quick response that keeps your API bills low?
Claude’s effort parameter gives you a single dial to control exactly that. Instead of switching between models or manually managing token budgets, you can now tell Claude how eager it should be about spending tokens—all within the same model.
This guide covers everything you need to know: what effort levels mean, how to use them in code, and practical recommendations for different workloads.
What Is the Effort Parameter?
The effort parameter is a new way to control Claude’s behavior across all tokens in a response—including text, tool calls, and extended thinking. It replaces the older budget_tokens parameter on Claude Opus 4.6 and Sonnet 4.6 (which is now deprecated).
- Works without enabling extended thinking
- Affects tool calls (e.g., fewer tool calls at lower effort)
- Combines seamlessly with adaptive thinking (
thinking: {type: "adaptive"}) - Available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6
Note: Athigh(default) andmaxeffort, Claude will almost always think. At lower levels, it may skip thinking for simpler problems.
Effort Levels Explained
| Level | Description | Best For |
|---|---|---|
max | Absolute maximum capability, no token constraints | Deep reasoning, complex analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6) |
xhigh | Extended capability for long-horizon work | Long-running agentic/coding tasks (>30 min, millions of tokens) – Opus 4.7 only |
high | High capability (default) | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced token savings | Agentic tasks needing speed/cost/performance balance |
low | Most efficient, significant token savings | Simple tasks, subagents, high-volume chat |
How to Use the Effort Parameter in Code
Python (Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "Explain quantum entanglement in simple terms."}
],
# Set effort level
extra_headers={
"anthropic-effort": "medium"
}
)
print(response.content[0].text)
TypeScript (Anthropic SDK)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 8192,
system: 'You are a helpful assistant.',
messages: [
{ role: 'user', content: 'Explain quantum entanglement in simple terms.' }
],
// Set effort level
extra_headers: {
'anthropic-effort': 'low'
}
});
console.log(response.content[0].text);
With Adaptive Thinking
For the best experience, combine effort with adaptive thinking:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
thinking={"type": "adaptive"},
extra_headers={
"anthropic-effort": "medium"
},
messages=[
{"role": "user", "content": "Design a scalable microservices architecture for an e-commerce platform."}
]
)
Recommended Effort Levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency, always set effort explicitly when using this model.
- Medium (recommended default): Best balance of speed, cost, and performance. Ideal for agentic coding, tool-heavy workflows, and code generation.
- Low: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.
Practical Use Cases
1. High-Volume Customer Support Chat
Uselow effort for simple FAQ-style queries. You’ll get fast responses and lower costs.
2. Complex Code Generation or Debugging
Usehigh or max effort. The extra token spend pays off in correctness and depth.
3. Multi-Step Agentic Workflows
Usemedium effort for sub-agents that handle routine tasks, and high or max for the orchestrator that makes critical decisions.
4. Long-Running Research or Analysis
Usexhigh (Opus 4.7 only) for tasks that require millions of tokens and deep reasoning over 30+ minutes.
Best Practices
- Always set effort explicitly when using Sonnet 4.6 to avoid defaulting to
high. - Combine with adaptive thinking for dynamic token allocation.
- Test with your workload – effort is a signal, not a hard limit. Run benchmarks to find the sweet spot.
- Use lower effort for sub-agents and higher effort for orchestrators in agentic systems.
- Monitor token usage – lower effort reduces tool call frequency, which can significantly cut costs.
Caveats
- Effort is not available on all models. Check the supported models list.
budget_tokensis deprecated on Opus 4.6 and Sonnet 4.6. Migrate to effort.- At
loweffort, Claude may skip thinking entirely for simple requests, which could reduce quality on edge cases.
Key Takeaways
- The effort parameter lets you control token spending across text, tool calls, and thinking—all with a single model.
- Five levels are available:
low,medium,high,xhigh(Opus 4.7 only), andmax. - Always set effort explicitly on Sonnet 4.6 to avoid unexpected latency.
- Combine with adaptive thinking for optimal results.
- Lower effort reduces tool call frequency, offering significant cost savings for high-volume or simple tasks.