Mastering Claude's Effort Parameter: Control Token Spend and Response Depth
Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost. Includes practical code examples and recommended settings for Sonnet 4.6 and Opus models.
This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn how to set effort levels from 'low' to 'max' to trade off between thoroughness and efficiency, with practical API examples and recommended defaults for Sonnet 4.6.
Introduction
When building applications with Claude, you often face a trade-off: do you want the deepest possible reasoning, or do you need fast, cost-effective responses? Traditionally, you'd need to switch between different models to achieve this balance. With Claude's effort parameter, you can control this behavior using a single model.
The effort parameter lets you dial in exactly how much "thinking" Claude does before responding—affecting not just reasoning but also tool calls, text generation, and extended thinking. This gives you fine-grained control over token consumption and response quality.
In this guide, you'll learn:
- What the effort parameter is and how it works
- The available effort levels and when to use each
- How to implement effort in your API calls (with code examples)
- Recommended settings for Claude Sonnet 4.6
- How effort compares to the legacy
budget_tokensparameter
How the Effort Parameter Works
The effort parameter is a behavioral signal that tells Claude how thoroughly it should process your request. It's available on all supported models without any beta header—just add it to your API request.
Key points:- By default, Claude uses
higheffort, spending as many tokens as needed for excellent results. - Setting
effortto"high"produces exactly the same behavior as omitting the parameter. - The parameter affects all tokens in the response: text, tool calls, function arguments, and extended thinking.
- Lower effort means Claude makes fewer tool calls and provides shorter, more direct responses.
- Effort is not a strict token budget—it's a behavioral guide. At lower levels, Claude will still think deeply on sufficiently difficult problems, but it will think less than it would at higher levels.
Supported Models
The effort parameter is supported by:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future release.
Effort Levels and Use Cases
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability with no constraints on token spending. | Tasks requiring the deepest possible reasoning (e.g., complex mathematical proofs, multi-step strategic planning). Available on Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6. |
xhigh | Extended capability for long-horizon work. | Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions. Available on Opus 4.7. |
high | High capability. Equivalent to not setting the parameter. | Complex reasoning, difficult coding problems, agentic tasks. |
medium | Balanced approach with moderate token savings. | Agentic tasks that require a balance of speed, cost, and performance. |
low | Most efficient. Significant token savings with some capability reduction. | Simpler tasks, high-volume chat, subagents, and latency-sensitive workloads. |
Recommended Settings for Sonnet 4.6
Sonnet 4.6 defaults to high effort. If you don't explicitly set the parameter, you'll get the full reasoning depth by default. For most applications, Anthropic recommends:
- Medium effort (recommended default): Best balance of speed, cost, and performance. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.
Important: Always explicitly set the effort parameter when using Sonnet 4.6 to avoid unexpected latency.
Using Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking:
{
"thinking": {
"type": "adaptive"
},
"effort": "medium"
}
Adaptive thinking allows Claude to decide when to use extended thinking based on the complexity of the task. When combined with effort, you get a powerful system that automatically adjusts both thinking depth and overall token spend.
Code Examples
Python (using the Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
Low effort for fast, cost-effective responses
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="low",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.content[0].text)
TypeScript (using the Anthropic SDK)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Medium effort for balanced performance
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
effort: 'medium',
messages: [
{ role: 'user', content: 'Write a Python function to sort a list of dictionaries by a key.' }
]
});
console.log(response.content[0].text);
Using Effort with Tool Calls
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
effort="low", # Reduces the number of tool calls
tools=[
{
"name": "get_weather",
"description": "Get the current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
],
messages=[
{"role": "user", "content": "What's the weather in Paris and London?"}
]
)
With low effort, Claude may make fewer tool calls or combine them
Best Practices
- Start with
mediumfor Sonnet 4.6: This gives you the best balance for most applications. Only increase tohighormaxwhen you need deeper reasoning.
- Use
lowfor high-volume or latency-sensitive workloads: If you're building a chatbot or handling many concurrent requests,loweffort can significantly reduce costs and response times.
- Combine with adaptive thinking: For maximum flexibility, use
thinking: {type: "adaptive"}alongside your chosen effort level. This lets Claude decide when to engage extended thinking.
- Test with your specific use case: The optimal effort level depends on your application. Run A/B tests to find the sweet spot between quality and cost.
- Monitor token usage: Lower effort levels should reduce token consumption. Track your usage to validate that the parameter is having the desired effect.
Effort vs. budget_tokens
If you're migrating from budget_tokens (used with Opus 4.6 and Sonnet 4.6), here's what you need to know:
| Aspect | budget_tokens | effort |
|---|---|---|
| Control type | Strict token budget | Behavioral signal |
| Requires thinking | Yes | No |
| Affects tool calls | Indirectly | Directly |
| Status | Deprecated | Recommended |
Conclusion
The effort parameter is a powerful tool for optimizing Claude's behavior in production applications. By choosing the right effort level, you can balance response quality, speed, and cost without switching between different models. Whether you're building a high-volume chatbot or a deep reasoning agent, effort gives you the control you need.
Key Takeaways
- Effort replaces
budget_tokensfor Opus 4.6 and Sonnet 4.6—use it instead of the deprecated parameter. - Lower effort reduces all token spend, including tool calls, not just thinking tokens.
- Medium effort is the recommended default for Sonnet 4.6, balancing speed, cost, and performance.
- Combine effort with adaptive thinking for the most flexible and efficient configuration.
- Effort is a behavioral signal, not a strict budget—Claude will still think deeply on hard problems even at low effort levels.