Mastering Claude's Effort Parameter: Control Token Spend and Response Depth
Learn how to use Claude's effort parameter to balance response thoroughness with token efficiency. Includes practical examples, effort levels, and best practices.
Claude's effort parameter lets you control how many tokens the model spends on responses, from low (fast/cheap) to max (deepest reasoning). This guide shows you how to set it, when to use each level, and how it compares to budget_tokens.
Mastering Claude's Effort Parameter: Control Token Spend and Response Depth
If you've ever wished you could tell Claude to "think harder" or "be more concise" without switching models, the effort parameter is exactly what you need. This powerful new feature gives you fine-grained control over how many tokens Claude spends on responses—directly influencing speed, cost, and reasoning depth.
In this guide, you'll learn what the effort parameter is, how it differs from the older budget_tokens approach, and exactly how to use it in your API calls. By the end, you'll be able to dial in the perfect balance of capability and efficiency for any task.
What Is the Effort Parameter?
The effort parameter is a behavioral signal that tells Claude how eager it should be about spending tokens when responding to requests. Instead of setting a hard token budget (which could cut off thinking mid-stream), effort adjusts Claude's natural tendency to think deeply or respond quickly.
Key insight: Effort affects all tokens in the response—not just thinking tokens. This includes:- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
Supported Models
The effort parameter is available on:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted on these models, it is deprecated and will be removed in a future release.
Effort Levels Explained
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability, no token constraints | Deepest reasoning, most thorough analysis |
xhigh | Extended capability for long-horizon work | Long-running agentic/coding tasks (>30 min) |
high | High capability (default) | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing speed/cost balance |
low | Most efficient, significant token savings | Simple tasks, subagents, high-volume chat |
"high" produces exactly the same behavior as omitting the parameter entirely.
How to Use the Effort Parameter in API Calls
Basic Usage (Python)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="low", # or "medium", "high", "max"
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
)
print(response.content[0].text)
Basic Usage (TypeScript)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
effort: 'low',
messages: [
{ role: 'user', content: 'Explain quantum computing in simple terms.' }
]
});
console.log(response.content[0].text);
Combining with Adaptive Thinking
For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This lets Claude dynamically decide whether to think based on the problem complexity, while effort controls how deeply it thinks when it does.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
thinking={"type": "adaptive"},
effort="medium",
messages=[
{"role": "user", "content": "Write a Python script to analyze sales data."}
]
)
Using Max Effort for Deep Reasoning
When you need Claude's absolute best reasoning—for complex math proofs, multi-step planning, or thorough code review—use effort: "max".
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
effort="max",
messages=[
{"role": "user", "content": "Prove the Riemann Hypothesis..."}
]
)
Recommended Effort Levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set effort when using this model:
- Medium effort (recommended default): Best balance of speed, cost, and performance. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.
- High effort: For tasks requiring deeper reasoning or complex analysis.
Effort vs. budget_tokens: What's the Difference?
| Aspect | Effort | budget_tokens (deprecated) |
|---|---|---|
| Control type | Behavioral signal | Hard token limit |
| Affects all tokens? | Yes (text, tools, thinking) | Only thinking tokens |
| Can skip thinking? | Yes (at lower levels) | No (always thinks up to budget) |
| Recommended? | Yes | No (deprecated on Opus 4.6/Sonnet 4.6) |
- It's a softer signal, so Claude won't cut off mid-thought when it hits a hard limit.
- It affects tool calls, giving you broader cost control.
- At lower levels, Claude can skip thinking entirely for simple problems, saving even more tokens.
Practical Tips and Best Practices
1. Start with Medium for Most Tasks
Unless you need maximum reasoning or absolute lowest cost,medium effort offers the sweet spot for most applications.
2. Use Low Effort for Subagents
When building multi-agent systems, set subagents tolow effort. They typically handle simpler, well-defined tasks where speed matters more than deep reasoning.
3. Reserve Max for Complex Problems
max effort can significantly increase token usage. Use it only for problems that genuinely require Claude's deepest reasoning capabilities.
4. Combine with Adaptive Thinking
Always pair effort withthinking: {type: "adaptive"} for optimal results. This gives Claude the flexibility to skip thinking when it's not needed.
5. Monitor Token Usage
Lower effort doesn't guarantee a fixed token count—it's a behavioral signal. Always monitor your actual token usage and adjust accordingly.Common Pitfalls to Avoid
- Don't assume low effort = no thinking. On sufficiently difficult problems, Claude will still think—just less than at higher levels.
- Don't use budget_tokens on new models. If you're using Opus 4.6 or Sonnet 4.6, switch to effort immediately.
- Don't forget to set effort explicitly on Sonnet 4.6. The default is
high, which may cause unexpected latency if you're expecting faster responses.
Conclusion
The effort parameter is a game-changer for Claude API users who want fine-grained control over token spend and response depth. By choosing the right effort level for each task, you can optimize for speed, cost, or capability—all with a single model.
Whether you're building a high-volume chat application, a deep-reasoning agent, or anything in between, effort gives you the dial you need to get the best results.
Key Takeaways
- Effort controls token spend across all response types—text, tool calls, and thinking—giving you broader cost control than budget_tokens.
- Five levels from
lowtomaxlet you trade off between speed/cost and reasoning depth for any task. - Combine with adaptive thinking (
thinking: {type: "adaptive"}) for the best experience. - On Sonnet 4.6, always set effort explicitly to avoid unexpected latency from the default
highsetting. - Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6—migrate your code to avoid future breakage.