Mastering Claude's Effort Parameter: Control Thinking Depth, Speed, and Cost
Learn how to use the effort parameter in Claude API to balance reasoning depth, token efficiency, and latency. Includes code examples and recommended settings for Opus and Sonnet models.
The effort parameter lets you control how thoroughly Claude thinks before responding, trading off between capability and token cost. Set it to 'low' for fast, cheap responses on simple tasks, or 'max' for deep reasoning on complex problems. Combine with adaptive thinking for best results.
Introduction
Every Claude user knows the dilemma: you want deep, thoughtful responses for complex tasks, but you don't want to burn through tokens (and money) when asking simple questions. The effort parameter solves this by giving you granular control over how "eager" Claude is about spending tokens—all with a single model, no switching required.
This guide explains exactly how effort works, when to use each level, and how to combine it with adaptive thinking for optimal results.
What Is the Effort Parameter?
The effort parameter is a behavioral signal that tells Claude how thoroughly to reason before responding. Unlike a strict token budget, effort is a soft guide: at lower levels, Claude will still think hard on genuinely difficult problems, but it will think less than it would at higher levels for the same task.
Key advantages:
- Works without extended thinking – You can use effort even when thinking is disabled.
- Affects all tokens – Including tool calls, function arguments, and text responses. Lower effort means fewer tool calls, giving you broader cost control.
Supported Models
| Model | Effort Support | Notes |
|---|---|---|
| Claude Mythos Preview | ✅ All levels | Max effort available |
| Claude Opus 4.7 | ✅ All levels | Includes xhigh for long-horizon tasks |
| Claude Opus 4.6 | ✅ All levels | Replaces budget_tokens |
| Claude Sonnet 4.6 | ✅ All levels | Recommended to set explicitly |
| Claude Opus 4.5 | ✅ All levels | Basic support |
Important: For Opus 4.6 and Sonnet 4.6,effortreplaces the deprecatedbudget_tokensparameter. Whilebudget_tokensstill works, it will be removed in a future release.
Effort Levels Explained
| Level | Description | Best Use Case |
|---|---|---|
max | Absolute maximum capability, no token constraints | Deepest reasoning, research-grade analysis |
xhigh | Extended capability for long-horizon work (Opus 4.7 only) | Agentic/coding tasks over 30 minutes with million+ token budgets |
high | Default behavior, excellent results | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing speed/cost balance |
low | Most efficient, significant token savings | Simple tasks, subagents, high-volume chat |
effort parameter is identical to setting effort: "high".
Recommended Settings for Sonnet 4.6
Sonnet 4.6 defaults to high effort, which may be overkill for many applications. Anthropic recommends explicitly setting effort to avoid unexpected latency:
- Medium effort (recommended default): Best balance for most apps—agentic coding, tool-heavy workflows, code generation.
- Low effort: High-volume or latency-sensitive workloads—chat, non-coding tasks where speed matters.
- High effort: Only for tasks requiring maximum reasoning depth.
How to Use the Effort Parameter
Basic API Call (Python)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="medium", # Control token spending
messages=[
{"role": "user", "content": "Explain quantum entanglement in simple terms."}
]
)
print(response.content[0].text)
With Extended Thinking (TypeScript)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-opus-4-20250514',
max_tokens: 2048,
thinking: {
type: 'adaptive', // Best paired with effort
budget_tokens: 4096
},
effort: 'high',
messages: [
{ role: 'user', content: 'Design a distributed caching system.' }
]
});
console.log(response.content);
Effort with Tool Use
Lower effort reduces the number of tool calls Claude makes, saving tokens on multi-step tasks:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="low", # Fewer tool calls, faster responses
tools=[
{
"name": "search_web",
"description": "Search the web for information",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
],
messages=[
{"role": "user", "content": "Find the latest news about AI regulation."}
]
)
Combining Effort with Adaptive Thinking
For the best experience, pair effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking dynamically adjusts the thinking budget based on task complexity, while effort sets the overall behavioral tone.
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={
"type": "adaptive",
"budget_tokens": 8192 # Maximum budget; adaptive will use less
},
effort="medium", # Balanced token spending
messages=[
{"role": "user", "content": "Analyze this financial dataset and identify trends."}
]
)
Practical Scenarios
Scenario 1: Customer Support Chatbot
- Effort:
low - Why: Most queries are simple (order status, FAQs). Low effort gives fast, cheap responses. For complex issues, Claude will still think harder.
Scenario 2: Code Generation Agent
- Effort:
medium(Sonnet 4.6) orhigh(Opus 4.6) - Why: Code generation benefits from moderate reasoning. Medium effort on Sonnet balances speed and quality.
Scenario 3: Research Assistant
- Effort:
max - Why: Deep analysis, multi-step reasoning, and thoroughness are critical. Token cost is secondary.
Scenario 4: Long-Running Agent (30+ minutes)
- Effort:
xhigh(Opus 4.7 only) - Why: Extended tasks with million+ token budgets need the highest capability without premature token exhaustion.
Best Practices
- Start with
mediumfor Sonnet 4.6 – It's the new recommended default for most applications. - Use
lowfor subagents – When Claude is part of a larger pipeline handling simple tasks, low effort saves tokens. - Pair with adaptive thinking – For maximum efficiency, combine
effortwiththinking: {type: "adaptive"}. - Monitor token usage – Lower effort doesn't guarantee a fixed token count; it's a behavioral signal. Test with your specific workloads.
- Avoid
budget_tokenson newer models – Useeffortinstead; it's more flexible and future-proof.
Key Takeaways
- The
effortparameter controls token spending behavior across text, tool calls, and thinking—no need to switch models for different task complexities. - Five levels from
lowtomaxlet you trade off between speed/cost and capability. Default ishigh. - Sonnet 4.6 users should explicitly set
effortto avoid unexpected latency;mediumis the recommended default. - Combine with adaptive thinking for the best balance of depth and efficiency.
effortreplacesbudget_tokenson Opus 4.6 and Sonnet 4.6—migrate your code to stay compatible with future model releases.