Mastering Claude’s Effort Parameter: Control Thinking Depth, Speed, and Cost
Learn how to use Claude's effort parameter to balance response thoroughness, latency, and token spend across all models—from simple subagents to deep reasoning tasks.
The effort parameter lets you control how eagerly Claude spends tokens on a response, from low (fast/cheap) to max (deepest reasoning). It works with or without extended thinking and affects text, tool calls, and thinking tokens. This guide explains each level, when to use it, and how to combine it with adaptive thinking for optimal results.
Introduction
Claude is incredibly capable, but sometimes you don’t need its full reasoning power. A quick chat, a simple data extraction, or a subagent handling a narrow task doesn’t require the same depth as a complex code review or a multi-step research analysis. That’s where the effort parameter comes in.
Effort gives you fine-grained control over how many tokens Claude spends on a response—without switching models. You can dial up to max for the deepest reasoning, or dial down to low for speed and cost savings. Best of all, it works whether or not you have extended thinking enabled.
In this guide, you’ll learn:
- What the effort parameter is and how it differs from
budget_tokens - Each effort level and when to use it
- How to combine effort with adaptive thinking
- Practical code examples for Python and TypeScript
- Tips for optimizing cost and latency
What Is the Effort Parameter?
The effort parameter is a behavioral signal that tells Claude how thoroughly it should approach a request. At high (the default), Claude spends as many tokens as needed for excellent results. At max, it goes even further—ideal for the hardest problems. At low, it conserves tokens, skipping unnecessary reasoning and making fewer tool calls.
Important: Effort is not a strict token budget. Claude will still think deeply on difficult problems even at lower levels—it just won’t think as much as it would at higher levels.
Supported Models
| Model | Effort Levels | Notes |
|---|---|---|
| Claude Mythos Preview | max, high, medium, low | Full support |
| Claude Opus 4.7 | max, xhigh, high, medium, low | xhigh for long-horizon tasks |
| Claude Opus 4.6 | max, high, medium, low | Replaces budget_tokens |
| Claude Sonnet 4.6 | max, high, medium, low | Replaces budget_tokens |
| Claude Opus 4.5 | high, medium, low | No max or xhigh |
Deprecation note:budget_tokensis still accepted on Opus 4.6 and Sonnet 4.6 but will be removed in a future release. Useeffortinstead.
Effort Levels Explained
low – Maximum Efficiency
- Best for: Simple tasks, high-volume chat, subagents, non-coding use cases
- Behavior: Significant token savings. Claude may skip thinking entirely for straightforward problems.
- Trade-off: Some capability reduction. Not suitable for complex reasoning.
medium – Balanced
- Best for: Agentic tasks that need a balance of speed, cost, and performance
- Behavior: Moderate token savings. Claude still thinks on difficult problems, but less than at higher levels.
- Recommended default for Sonnet 4.6: Best balance for most applications.
high – Default Capability
- Best for: Complex reasoning, difficult coding, agentic tasks
- Behavior: Equivalent to omitting the parameter. Claude spends as many tokens as needed.
- Trade-off: No cost optimization, but full capability.
xhigh – Extended Capability (Opus 4.7 only)
- Best for: Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions
- Behavior: Designed for sustained, deep reasoning over very long contexts.
max – Absolute Maximum
- Best for: The hardest problems requiring deepest possible reasoning
- Behavior: No constraints on token spending. Available on Mythos, Opus 4.7, Opus 4.6, and Sonnet 4.6.
- Trade-off: Highest cost and latency.
How Effort Affects All Tokens
Unlike budget_tokens, which only controlled thinking tokens, effort affects every token in the response:
- Text responses and explanations – Less verbose at lower levels
- Tool calls and function arguments – Fewer tool calls at lower levels
- Extended thinking – Less thinking depth when enabled
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking lets Claude decide how much thinking to use based on the problem, while effort sets the overall ceiling.
Example: With effort: "low" and adaptive thinking, Claude will think only when absolutely necessary, and even then, minimally. With effort: "max", it will think deeply on every request.
Practical Code Examples
Python (using the Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "Explain quantum entanglement in simple terms."}
],
# Set effort to low for a quick, concise answer
effort={"type": "low"}
)
print(response.content[0].text)
TypeScript (using the Anthropic SDK)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
async function main() {
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
system: 'You are a helpful assistant.',
messages: [
{ role: 'user', content: 'Write a Python function to merge two sorted lists.' }
],
// Use medium effort for a balanced response
effort: { type: 'medium' }
});
console.log(response.content[0].text);
}
main();
With Adaptive Thinking
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
thinking={"type": "adaptive"},
effort={"type": "max"}, # Deepest reasoning with adaptive thinking
messages=[
{"role": "user", "content": "Solve this complex math problem step by step..."}
]
)
Recommended Configurations
For Sonnet 4.6
| Use Case | Effort Level | Why |
|---|---|---|
| Chat / Q&A | low | Fast, cheap, good enough |
| Agentic coding | medium | Best balance |
| Complex code generation | high | Full capability |
| Hardest problems | max | No compromises |
For Opus 4.7
| Use Case | Effort Level | Why |
|---|---|---|
| Quick research | medium | Balanced depth |
| Multi-hour coding session | xhigh | Sustained reasoning |
| Scientific analysis | max | Deepest thinking |
Tips for Optimizing Cost and Latency
- Start with
mediumfor Sonnet 4.6 – It’s the recommended default and avoids unexpected latency. - Use
lowfor subagents – Subagents handling narrow tasks don’t need deep reasoning. - Reserve
maxfor the hardest 10% of requests – It’s powerful but expensive. - Combine with adaptive thinking – Let Claude decide when to think, while you control the ceiling.
- Monitor token usage – Effort affects all tokens, so track total spend per request.
Conclusion
The effort parameter is a powerful tool for fine-tuning Claude’s behavior. Whether you’re building a high-volume chatbot, a deep research agent, or anything in between, you now have a single dial to control thoroughness, speed, and cost—without switching models.
By combining effort with adaptive thinking, you get the best of both worlds: Claude decides when to think, and you decide how much.
Key Takeaways
- Effort replaces
budget_tokenson Opus 4.6 and Sonnet 4.6 and works on all supported models. - Effort affects all tokens – text, tool calls, and thinking – giving you broad control over spend.
- Use
mediumas your default for Sonnet 4.6 to balance speed, cost, and performance. - Combine with adaptive thinking (
thinking: {type: "adaptive"}) for optimal results. - Reserve
maxandxhighfor the most demanding tasks; uselowfor simple or high-volume workloads.