Mastering Claude's Effort Parameter: Optimize Token Usage for Speed, Cost, and Capability
Learn how to control Claude's thinking depth with the effort parameter. Balance response thoroughness and token efficiency across models like Opus 4.6 and Sonnet 4.6.
This guide explains how to use Claude's effort parameter to control token spending, trading off between response thoroughness and efficiency. You'll learn effort levels, recommended settings for Sonnet 4.6, and how to combine effort with adaptive thinking.
Introduction
When building applications with Claude, you often face a trade-off: do you want the deepest possible reasoning, or do you need fast, cost-effective responses? Traditionally, you'd switch between models to balance these needs. But with the effort parameter, you can now control this behavior within a single model.
Effort lets you tell Claude how eager it should be about spending tokens when responding. Think of it as a dial: turn it up for maximum capability on complex problems, or turn it down for speed and savings on simpler tasks. This guide covers everything you need to know to use effort effectively.
What Is the Effort Parameter?
The effort parameter is a behavioral signal that influences how thoroughly Claude processes your request. It affects all tokens in the response—including text, tool calls, and extended thinking (when enabled). This is a key advantage over older methods like budget_tokens, which only controlled thinking tokens.
- Works without enabling extended thinking
- Affects tool call frequency (lower effort = fewer tool calls)
- Single model, multiple behavior profiles
Supported Models
Effort is generally available on:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it's deprecated and will be removed in a future release.
Effort Levels Explained
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability, no token constraints | Deepest reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6) |
xhigh | Extended capability for long-horizon work | Long-running agentic/coding tasks over 30 minutes (Opus 4.7 only) |
high | High capability (default) | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing speed/cost balance |
low | Most efficient, significant token savings | Simpler tasks, subagents, high-volume workloads |
Recommended Settings for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency, always set effort explicitly:
- Medium (recommended default): Best balance of speed, cost, and performance. Ideal for agentic coding, tool-heavy workflows, and code generation.
- Low: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases.
- High: For tasks requiring maximum capability.
How to Use Effort in the API
Python Example
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
# Set effort level
extra_headers={
"anthropic-effort": "medium"
}
)
print(response.content[0].text)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 8192,
system: 'You are a helpful assistant.',
messages: [
{ role: 'user', content: 'Explain quantum computing in simple terms.' }
],
extra_headers: {
'anthropic-effort': 'medium'
}
});
console.log(response.content[0].text);
With Extended Thinking (Adaptive)
For the best experience, combine effort with adaptive thinking:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
thinking={"type": "adaptive"},
messages=[
{"role": "user", "content": "Solve this complex math problem step by step."}
],
extra_headers={
"anthropic-effort": "high"
}
)
Practical Use Cases
1. Cost-Sensitive Production Apps
Use low effort for high-volume customer support chatbots where most queries are simple. You'll save tokens and reduce latency while maintaining acceptable quality.
2. Multi-Agent Systems
Assign different effort levels to different agents:
- Coordinator agent:
loweffort (fast routing decisions) - Research agent:
higheffort (deep analysis) - Code generation agent:
mediumeffort (balanced)
3. Tiered User Experience
Offer users a choice:
- Quick mode:
loweffort (free tier) - Balanced mode:
mediumeffort (standard tier) - Deep mode:
highormaxeffort (premium tier)
Best Practices
- Always set effort explicitly for Sonnet 4.6 to avoid unexpected latency from the default
highsetting. - Combine with adaptive thinking (
thinking: {type: "adaptive"}) for optimal token usage. - Test different levels on your specific use case—the token savings vary by task complexity.
- Monitor token usage to quantify savings and adjust effort levels accordingly.
- Use
maxsparingly—it's designed for the most demanding tasks and will consume more tokens.
Key Takeaways
- Effort controls token spending across all response types—text, tool calls, and thinking—giving you fine-grained control over cost and speed.
- Medium effort is the recommended default for Sonnet 4.6, balancing performance and efficiency for most applications.
- Combine effort with adaptive thinking for the best results, especially on complex tasks.
- Lower effort levels still allow deep thinking on hard problems—Claude adapts its behavior based on task difficulty.
- Effort replaces
budget_tokenson Opus 4.6 and Sonnet 4.6, so migrate your code to use the new parameter.