Mastering Claude's Effort Parameter: Balance Token Efficiency and Response Quality
Learn how to control Claude's token spending with the effort parameter. Optimize for speed, cost, or deep reasoning across models like Opus 4.6 and Sonnet 4.6.
This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn how to set effort levels from 'low' to 'max' to balance speed, cost, and reasoning depth, with practical API examples for Sonnet 4.6 and Opus 4.6.
Mastering Claude's Effort Parameter: Balance Token Efficiency and Response Quality
When building applications with Claude, you often face a trade-off between response quality and token efficiency. Do you want Claude to think deeply and produce thorough answers, or do you need fast, cost-effective responses for high-volume tasks? The effort parameter gives you precise control over this balance—without switching models.
This guide explains what the effort parameter is, how it works, and how to use it effectively in your API calls. By the end, you'll know how to choose the right effort level for any task and optimize your Claude integration for speed, cost, or depth.
What Is the Effort Parameter?
The effort parameter controls how eager Claude is about spending tokens when responding to requests. It's a behavioral signal that influences all tokens in the response—including text, tool calls, and extended thinking. Unlike a strict token budget, effort adapts to the complexity of each request: Claude will still think deeply on hard problems, but it will use fewer tokens than it would at a higher effort level.
Key benefits:- Works without enabling extended thinking
- Affects all token spend, including tool calls (fewer tool calls at lower effort)
- Single model handles multiple efficiency levels
Supported Models
The effort parameter is generally available on:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.
Effort Levels Explained
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability, no token constraints | Deepest reasoning, most thorough analysis |
xhigh | Extended capability for long-horizon work | Long-running agentic/coding tasks (>30 min) with million-token budgets |
high | High capability (default) | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing speed/cost balance |
low | Most efficient, significant token savings | Simple tasks, subagents, high-volume chat |
Note:maxis available on Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.xhighis available only on Opus 4.7.
Setting effort to "high" produces exactly the same behavior as omitting the parameter entirely.
Recommended Effort Levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set the effort level:
- Medium effort (recommended default): Best balance of speed, cost, and performance. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.
- High effort: For tasks requiring deep reasoning or complex analysis.
How to Use the Effort Parameter in the API
Basic Usage (Python)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
# Set effort level
extra_headers={
"anthropic-effort": "medium"
}
)
print(response.content[0].text)
Basic Usage (TypeScript)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
system: 'You are a helpful assistant.',
messages: [
{ role: 'user', content: 'Explain quantum computing in simple terms.' }
],
// Set effort level
extra_headers: {
'anthropic-effort': 'medium'
}
});
console.log(response.content[0].text);
Combining Effort with Extended Thinking
For maximum control, combine effort with adaptive thinking:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
thinking={
"type": "adaptive",
"budget_tokens": 1024
},
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "Design a scalable microservices architecture for an e-commerce platform."}
],
extra_headers={
"anthropic-effort": "max"
}
)
print(response.content[0].text)
Using Effort with Tool Calls
Lower effort levels reduce the number of tool calls Claude makes, saving tokens:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "search_web",
"description": "Search the web for information",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
],
messages=[
{"role": "user", "content": "Find the latest news about AI regulation."}
],
extra_headers={
"anthropic-effort": "low"
}
)
print(response.content[0].text)
Practical Scenarios
Scenario 1: High-Volume Customer Support Chat
For a chatbot handling simple FAQs, use low effort to minimize latency and cost:
extra_headers={
"anthropic-effort": "low"
}
Scenario 2: Complex Code Generation
For generating production-ready code with multiple files, use high effort (default) or max:
extra_headers={
"anthropic-effort": "high"
}
Scenario 3: Long-Running Agentic Tasks
For tasks that run over 30 minutes with million-token budgets, use xhigh (Opus 4.7 only):
extra_headers={
"anthropic-effort": "xhigh"
}
Scenario 4: Balanced Subagent
For a subagent that needs reasonable quality but must stay fast, use medium:
extra_headers={
"anthropic-effort": "medium"
}
Best Practices
- Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the default
highsetting. - Start with medium for most applications, then adjust based on observed performance and cost.
- Combine with adaptive thinking (
thinking: {type: "adaptive"}) for the best experience—Claude will automatically decide when to think. - Monitor token usage across different effort levels to find the sweet spot for your use case.
- Use low effort for subagents and simple tasks where speed matters more than depth.
Limitations and Considerations
- Effort is a behavioral signal, not a strict token budget. Claude may still think deeply on hard problems even at low effort.
- At lower effort levels, expect some reduction in capability, especially for complex reasoning.
- The
xhighlevel is only available on Claude Opus 4.7. - For Opus 4.6 and Sonnet 4.6, effort replaces the deprecated
budget_tokensparameter.
Key Takeaways
- The effort parameter gives you fine-grained control over Claude's token spending, trading off between response thoroughness and efficiency.
- Five levels are available:
low,medium,high(default),xhigh(Opus 4.7), andmax. - Medium effort is recommended as a default for most applications, especially with Sonnet 4.6.
- Combine effort with adaptive thinking for optimal results—Claude decides when to think based on task complexity.
- Lower effort reduces all token spend, including tool calls, making it ideal for high-volume or latency-sensitive workloads.