Mastering Claude's Effort Parameter: Control Thinking Depth and Token Spend
Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost. Includes code examples, effort levels, and best practices for Opus and Sonnet models.
This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn the five effort levels (low, medium, high, xhigh, max), how to set them in API calls, and when to use each for optimal balance of performance, speed, and cost.
Mastering Claude's Effort Parameter: Control Thinking Depth and Token Spend
Claude's effort parameter gives you fine-grained control over how many tokens your model spends on each response. Whether you're building a high-volume chat application, a complex agentic system, or a cost-sensitive tool, understanding effort is key to getting the best performance-to-cost ratio.
In this guide, you'll learn:
- What the effort parameter is and how it works
- The five effort levels and when to use each
- How effort interacts with extended thinking
- Practical code examples for Python and TypeScript
- Best practices for different use cases
What Is the Effort Parameter?
The effort parameter lets you control how "eager" Claude is about spending tokens when responding to requests. By default, Claude uses high effort, spending as many tokens as needed for excellent results. You can raise the effort to max for the absolute highest capability, or lower it to low for faster, cheaper responses.
Key advantages:
- Works without extended thinking — effort affects all tokens, including text responses and tool calls
- Controls tool call frequency — lower effort means fewer tool calls, saving tokens
- Single-model flexibility — you can trade off between thoroughness and efficiency without switching models
Supported Models
The effort parameter is available on:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
budget_tokens parameter.
Effort Levels Explained
| Level | Description | Typical Use Case |
|---|---|---|
low | Most efficient. Significant token savings with some capability reduction. | Simple tasks, high-volume chat, subagents |
medium | Balanced approach with moderate token savings. | Agentic tasks needing speed/cost balance |
high (default) | High capability. Equivalent to omitting the parameter. | Complex reasoning, coding, agentic tasks |
xhigh | Extended capability for long-horizon work. Available on Opus 4.7. | Long-running agentic/coding tasks (>30 min) |
max | Absolute maximum capability with no constraints. | Deepest reasoning, most thorough analysis |
Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think on sufficiently difficult problems — but it will think less than it would at higher levels for the same problem.
How Effort Works with Extended Thinking
When you combine effort with adaptive thinking (thinking: {type: "adaptive"}), Claude automatically adjusts its thinking depth based on the problem complexity. This is the recommended configuration for most use cases.
At high (default) and max effort, Claude will almost always think. At lower levels, it may skip thinking for simpler problems, saving tokens and reducing latency.
Code Examples
Python (using the Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
Low effort — fast, cheap, for simple tasks
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "What is the capital of France?"}],
extra_headers={"anthropic-effort": "low"}
)
Medium effort — balanced for agentic tasks
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a coding assistant.",
messages=[{"role": "user", "content": "Write a Python function to sort a list of dictionaries by a key."}],
extra_headers={"anthropic-effort": "medium"}
)
High effort (default) — complex reasoning
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=8192,
messages=[{"role": "user", "content": "Explain the implications of quantum computing on cryptography."}],
# Omitting effort header defaults to "high"
)
Max effort — deepest reasoning
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=16384,
messages=[{"role": "user", "content": "Prove the Riemann Hypothesis."}],
extra_headers={"anthropic-effort": "max"}
)
TypeScript (using the Anthropic SDK)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Low effort
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
system: 'You are a helpful assistant.',
messages: [{ role: 'user', content: 'What is the capital of France?' }],
extraHeaders: { 'anthropic-effort': 'low' }
});
// Medium effort with adaptive thinking
const response2 = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
thinking: { type: 'adaptive' },
messages: [{ role: 'user', content: 'Debug this code: ...' }],
extraHeaders: { 'anthropic-effort': 'medium' }
});
// Max effort for complex analysis
const response3 = await client.messages.create({
model: 'claude-opus-4-20250514',
max_tokens: 16384,
thinking: { type: 'adaptive' },
messages: [{ role: 'user', content: 'Analyze this legal contract...' }],
extraHeaders: { 'anthropic-effort': 'max' }
});
Recommended Effort Levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency and cost, explicitly set effort when using this model:
- Medium effort (recommended default): Best balance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed matters more than depth.
Best Practices
1. Start with medium, then adjust
For new applications, begin with medium effort. Monitor response quality and token usage, then adjust up or down based on your specific needs.
2. Use adaptive thinking with effort
Combine effort with thinking: {type: "adaptive"} for the best experience. This lets Claude decide when to think deeply and when to respond quickly, saving tokens on simple queries.
3. Match effort to task complexity
- Simple Q&A, classification, extraction:
low - Multi-step agents, code generation:
medium - Complex reasoning, analysis:
high - Research-grade problems, deep analysis:
max
4. Consider cost implications
Lower effort levels can significantly reduce token spend, especially on tool calls. For high-volume applications, even a 20% reduction in tokens per call can lead to substantial savings.
5. Test with representative workloads
Effort affects behavior differently depending on the problem. Always test with your actual use case to find the optimal level.
Common Pitfalls
- Assuming low effort means no thinking: Claude will still think on difficult problems, just less deeply.
- Forgetting to set effort on Sonnet 4.6: Defaults to
high, which may be more expensive than needed. - Using effort without adaptive thinking: While effort works without thinking, combining them yields better results.
- Expecting strict token budgets: Effort is a behavioral signal, not a hard limit.
Key Takeaways
- Effort controls token spend across all response types, including text, tool calls, and extended thinking — without requiring thinking to be enabled.
- Five levels (low, medium, high, xhigh, max) let you trade off between speed/cost and capability, all with a single model.
- Combine effort with adaptive thinking for the best balance of performance and efficiency.
- Explicitly set effort on Sonnet 4.6 to avoid unexpected latency and cost from the default
highsetting. - Start with medium effort for most applications, then adjust based on observed quality and token usage.