Mastering Claude's Effort Parameter: Control Thinking Depth, Cost, and Speed
Learn how to use Claude's effort parameter to balance response thoroughness, token efficiency, and latency. Includes code examples, recommended levels, and best practices.
This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn how to set effort levels (low, medium, high, xhigh, max) to trade off between response quality and speed/cost, with practical API examples and recommended defaults for different use cases.
Introduction
Claude is incredibly powerful, but with great power comes great token consumption. Every deep reasoning step, every tool call, and every carefully crafted explanation costs tokens—and therefore time and money. But what if you could dial Claude's "thinking effort" up or down depending on the task?
That's exactly what the effort parameter does. Introduced across Claude's latest models (including Claude Opus 4.7, Opus 4.6, Sonnet 4.6, and the Mythos Preview), effort gives you fine-grained control over how many tokens Claude spends on a response. It's a single knob that affects everything from reasoning depth to tool call frequency.
In this guide, you'll learn:
- How the effort parameter works under the hood
- When to use each effort level (low, medium, high, xhigh, max)
- How to set effort in the API with Python and TypeScript
- Best practices for combining effort with adaptive thinking
- Real-world trade-offs between speed, cost, and capability
How the Effort Parameter Works
By default, Claude uses high effort—spending as many tokens as needed for excellent results. Setting effort to "high" is identical to omitting the parameter entirely.
The effort parameter is a behavioral signal, not a strict token budget. At lower levels, Claude will still think deeply on sufficiently difficult problems, but it will think less than it would at higher levels for the same problem.
Crucially, effort affects all tokens in the response:
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
budget_tokens, which only constrained thinking tokens.
Effort Levels Explained
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability, no constraints on token spending | Deepest reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6) |
xhigh | Extended capability for long-horizon work | Long-running agentic/coding tasks over 30 minutes with million-token budgets (Opus 4.7 only) |
high | High capability (default) | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing a balance of speed, cost, and performance |
low | Most efficient, significant token savings | Simpler tasks, subagents, high-volume chat |
Important:maxis available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.xhighis currently exclusive to Opus 4.7.
Recommended Defaults for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set the effort level:
- Medium (recommended default): Best balance for most applications—agentic coding, tool-heavy workflows, code generation.
- Low: For high-volume or latency-sensitive workloads—chat, non-coding use cases where faster turnaround matters.
Setting Effort in the API
Python (using the Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
Low effort for fast, cheap responses
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
effort="low" # or "medium", "high", "max"
)
print(response.content[0].text)
TypeScript (using the Anthropic SDK)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
system: 'You are a helpful assistant.',
messages: [
{ role: 'user', content: 'What is the capital of France?' }
],
effort: 'low' // or 'medium', 'high', 'max'
});
console.log(response.content[0].text);
Combining with Adaptive Thinking
For the best experience, combine effort with adaptive thinking. This allows Claude to dynamically decide whether to think on each request, saving tokens on simple queries while still reasoning deeply on complex ones.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
thinking={"type": "adaptive"},
messages=[
{"role": "user", "content": "Solve this equation step by step: 3x + 7 = 22"}
],
effort="medium"
)
Practical Use Cases by Effort Level
Low Effort: High-Volume Chat & Subagents
When you're running a swarm of subagents or handling thousands of simple queries per minute, every token counts. Low effort can reduce token spend by 30-50% on straightforward tasks.
Example: A customer support bot answering FAQs.response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
messages=[{"role": "user", "content": "What are your return policy hours?"}],
effort="low"
)
Medium Effort: Agentic Coding & Tool-Heavy Workflows
Medium is the sweet spot for most production applications. Claude will still reason deeply when needed but won't over-analyze simple steps.
Example: A code generation agent that calls tools to read files, write code, and run tests.response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
tools=[
{
"name": "read_file",
"description": "Read the contents of a file",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string"}
},
"required": ["path"]
}
}
],
messages=[{"role": "user", "content": "Refactor the main function in app.py to use async/await"}],
effort="medium"
)
High Effort: Complex Reasoning & Difficult Problems
Use high (or omit the parameter) when you need Claude's full reasoning capability—complex math, multi-step planning, nuanced analysis.
Example: Analyzing a legal contract for potential issues.Max Effort: The Absolute Best Claude Can Do
Reserve max for your hardest problems where token cost is secondary to getting the right answer. This is ideal for research, advanced mathematics, or debugging elusive bugs.
Migration from budget_tokens
If you've been using budget_tokens on Opus 4.6 or Sonnet 4.6, it's time to switch. The budget_tokens parameter is deprecated and will be removed in a future model release. Replace it with effort:
# Old way (deprecated)
response = client.messages.create(
model="claude-sonnet-4-20250514",
thinking={"type": "enabled", "budget_tokens": 2048},
messages=[...]
)
New way (recommended)
response = client.messages.create(
model="claude-sonnet-4-20250514",
thinking={"type": "adaptive"},
messages=[...],
effort="medium"
)
Best Practices
- Start with medium for most applications—it's the best balance of speed, cost, and capability.
- Use low for subagents and high-volume, simple tasks to maximize throughput.
- Combine with adaptive thinking (
thinking: {type: "adaptive"}) for optimal token efficiency. - Explicitly set effort on Sonnet 4.6 to avoid unexpected latency from the default
high. - Monitor token usage in production—the effort parameter is a behavioral signal, so actual token spend may vary.
- Test with your specific workload—the optimal effort level depends on task complexity and your latency/cost requirements.
Limitations & Considerations
- Effort is a behavioral signal, not a strict budget. At lower levels, Claude may still think deeply on hard problems.
- The
xhighlevel is currently only available on Claude Opus 4.7. - Lower effort may reduce tool call frequency and response quality on complex tasks.
- This feature is eligible for Zero Data Retention (ZDR)—data is not stored after the API response is returned.
Key Takeaways
- The effort parameter controls token spending across all response types—text, tool calls, and thinking—giving you a single knob for cost/speed optimization.
- Use
mediumas your default for most applications,lowfor high-volume simple tasks, andhighormaxfor complex reasoning. - Combine effort with adaptive thinking (
thinking: {type: "adaptive"}) for the best balance of depth and efficiency. - Migrate from
budget_tokenstoefforton Opus 4.6 and Sonnet 4.6—the old parameter is deprecated. - Always explicitly set effort on Sonnet 4.6 to avoid unexpected latency from the default
highsetting.