Mastering Claude's Effort Parameter: Control Token Spend Without Sacrificing Quality
Learn how to use Claude's effort parameter to balance response thoroughness and token efficiency across all API calls, with practical code examples and recommended settings.
This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn how to set effort levels from low to max, combine it with adaptive thinking, and optimize for speed, cost, or capability across different use cases.
Introduction
Every Claude API call is a balancing act. You want the best possible response, but you also care about speed and cost. Traditionally, you had to choose between different models or manually set budget_tokens to control thinking depth. The effort parameter changes that entirely.
Effort gives you a single, intuitive dial to control how eagerly Claude spends tokens on any request—whether it's generating text, making tool calls, or performing extended thinking. This guide covers everything you need to know to use effort effectively, including recommended settings for Claude Sonnet 4.6 and Opus 4.7.
What Is the Effort Parameter?
The effort parameter is a behavioral signal that tells Claude how much token spend is appropriate for a given request. It's not a strict budget—Claude will still think deeply on hard problems even at lower effort levels—but it strongly influences how thorough the response is.
Key benefits:- Works without enabling extended thinking
- Affects all token spend, including tool calls and function arguments
- Single model can serve both quick chat and deep reasoning tasks
- Available on all supported models with no beta header required
Supported Models
| Model | Effort Levels | Notes |
|---|---|---|
| Claude Mythos Preview | low, medium, high, max | Full support |
| Claude Opus 4.7 | low, medium, high, xhigh, max | xhigh for long-horizon tasks |
| Claude Opus 4.6 | low, medium, high, max | Replaces budget_tokens |
| Claude Sonnet 4.6 | low, medium, high, max | Replaces budget_tokens |
| Claude Opus 4.5 | low, medium, high, max | Basic support |
Note: For Opus 4.6 and Sonnet 4.6, effort replacesbudget_tokensas the recommended way to control thinking depth. Whilebudget_tokensis still accepted, it is deprecated and will be removed in a future model release.
Effort Levels Explained
max
- Description: Absolute maximum capability with no constraints on token spending.
- Use case: Tasks requiring the deepest possible reasoning and most thorough analysis.
- Available on: Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6.
xhigh (Opus 4.7 only)
- Description: Extended capability for long-horizon work.
- Use case: Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions.
high (default)
- Description: High capability. Equivalent to not setting the parameter.
- Use case: Complex reasoning, difficult coding problems, agentic tasks.
medium
- Description: Balanced approach with moderate token savings.
- Use case: Agentic tasks that require a balance of speed, cost, and performance.
low
- Description: Most efficient. Significant token savings with some capability reduction.
- Use case: Simpler tasks that need the best speed and lowest costs, such as subagents.
How Effort Works in Practice
When you set effort to high, Claude behaves exactly as if you omitted the parameter entirely. At max, it will think more and potentially make more tool calls. At low, it will be more conservative—skipping thinking for simple problems and making fewer tool calls.
This is powerful because it affects all tokens in the response:
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
Recommended Settings for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency and token consumption, Anthropic recommends explicitly setting effort:
- Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases.
Code Examples
Python SDK
import anthropic
client = anthropic.Anthropic()
Low effort for simple, fast responses
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="low",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
High effort for complex reasoning
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
effort="high",
messages=[
{"role": "user", "content": "Explain the implications of quantum entanglement on information theory."}
]
)
Max effort for deepest analysis
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=8192,
effort="max",
messages=[
{"role": "user", "content": "Design a complete architecture for a distributed database system."}
]
)
TypeScript SDK
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Medium effort for balanced performance
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 2048,
effort: 'medium',
messages: [
{ role: 'user', content: 'Write a Python function to merge two sorted lists.' }
]
});
// With adaptive thinking (recommended)
const responseWithThinking = await client.messages.create({
model: 'claude-opus-4-20250514',
max_tokens: 4096,
effort: 'high',
thinking: { type: 'adaptive' },
messages: [
{ role: 'user', content: 'Solve this complex math problem step by step.' }
]
});
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking by setting thinking: {type: "adaptive"}. This allows Claude to dynamically decide how much thinking is needed based on the problem complexity and the effort level you've set.
At high and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems, saving tokens and reducing latency.
Practical Use Cases
1. Multi-tier Agent Systems
Use different effort levels for different agents in a system:
# Orchestrator agent: high effort for planning
orchestrator_effort = "high"
Sub-agent for simple lookups: low effort
sub_agent_effort = "low"
Code generation agent: medium effort
code_agent_effort = "medium"
2. Cost-Sensitive Applications
For high-volume chat applications, use low effort to reduce token consumption while maintaining acceptable quality for simple queries.
3. Deep Research Tasks
For research or analysis tasks requiring maximum thoroughness, use max effort on Opus 4.7 or Mythos Preview.
Best Practices
- Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the default
highsetting. - Start with
mediumfor most applications and adjust based on observed quality and cost. - Combine with adaptive thinking (
thinking: {type: "adaptive"}) for optimal token efficiency. - Use
lowfor sub-agents and simple classification tasks to minimize costs. - Reserve
maxfor complex reasoning where you need the absolute best quality.
Limitations and Considerations
- Effort is a behavioral signal, not a strict token budget. Claude may still think deeply on hard problems even at
loweffort. - The
xhighlevel is currently only available on Claude Opus 4.7. - Lower effort levels may reduce quality on complex tasks—always test with your specific use case.
Key Takeaways
- Effort replaces
budget_tokensas the recommended way to control thinking depth on Opus 4.6 and Sonnet 4.6. - Five levels are available: low, medium, high, xhigh (Opus 4.7 only), and max.
- Affects all token spend, including text, tool calls, and extended thinking.
- Combine with adaptive thinking for the best balance of quality and efficiency.
- Set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the default
highsetting.