Mastering Claude's Effort Parameter: Control Thinking Depth, Token Usage, and Cost
Learn how to use Claude's effort parameter to control token spending, thinking depth, and response thoroughness across models like Opus 4.6, Sonnet 4.6, and Mythos Preview.
Claude's effort parameter lets you control how many tokens the model spends on reasoning and responses. Set it to 'low' for fast, cheap answers; 'high' for thorough analysis; or 'max' for the deepest reasoning. It works with or without extended thinking and affects all tokens including tool calls.
Introduction
Claude is incredibly capable, but sometimes you don't need the full force of its reasoning engine. Maybe you're building a high-volume chatbot where speed matters more than depth, or perhaps you're running a complex agent that needs to think deeply about every step. Until recently, you had limited control over this trade-off. Enter the effort parameter.
The effort parameter is a new API feature that lets you dial Claude's token consumption up or down — controlling how "eager" the model is to spend tokens on reasoning, explanations, and tool calls. It's available on Claude Opus 4.5, Opus 4.6, Sonnet 4.6, Opus 4.7, and the Claude Mythos Preview model. This guide will show you exactly how to use it, when to use each level, and how to combine it with adaptive thinking for the best results.
How the Effort Parameter Works
By default, Claude operates at high effort — it spends as many tokens as needed to produce excellent results. The effort parameter gives you a sliding scale:
- max: Absolute maximum capability, no constraints on token spending.
- xhigh: Extended capability for long-horizon work (Opus 4.7 only).
- high: Default behavior. Equivalent to omitting the parameter.
- medium: Balanced approach with moderate token savings.
- low: Most efficient. Significant token savings with some capability reduction.
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
budget_tokens parameter.
When to Use Each Effort Level
Low Effort — Speed and Cost Optimization
Use low effort for:
- High-volume chat applications
- Simple Q&A or FAQ bots
- Subagents that handle straightforward tasks
- Latency-sensitive workloads
Medium Effort — The Sweet Spot
Medium effort is the recommended default for most production applications. It's ideal for:
- Agentic coding tasks
- Tool-heavy workflows
- Code generation
- General-purpose assistants
High Effort — Default Thoroughness
High effort is the default and works well for:
- Complex reasoning tasks
- Difficult coding problems
- Agentic tasks that require careful planning
- Any task where quality is more important than speed
Max Effort — Deepest Reasoning
Max effort is available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6. Use it for:
- Tasks requiring the deepest possible reasoning
- Scientific analysis
- Complex multi-step problem solving
- Research and analysis
XHigh Effort — Long-Horizon Work (Opus 4.7 Only)
Xhigh is exclusive to Claude Opus 4.7. Use it for:
- Long-running agentic tasks (over 30 minutes)
- Coding tasks with token budgets in the millions
- Extended autonomous workflows
Code Examples
Python SDK
import anthropic
client = anthropic.Anthropic()
Low effort — fast and cheap
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="low",
messages=[
{"role": "user", "content": "Explain quantum computing in one paragraph."}
]
)
Medium effort — balanced
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
effort="medium",
messages=[
{"role": "user", "content": "Write a Python function to merge two sorted lists."}
]
)
Max effort — deepest reasoning
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
effort="max",
messages=[
{"role": "user", "content": "Prove the Riemann Hypothesis."}
]
)
TypeScript SDK
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Low effort
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
effort: 'low',
messages: [
{ role: 'user', content: 'Summarize this article in 50 words.' }
]
});
// Medium effort
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 2048,
effort: 'medium',
messages: [
{ role: 'user', content: 'Debug this code and explain the fix.' }
]
});
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking. Adaptive thinking lets Claude decide when to use extended thinking based on the complexity of the task. When you set effort to a lower level, Claude will still use thinking for hard problems, but it will think less than at higher effort levels.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
thinking={"type": "adaptive"},
effort="medium",
messages=[
{"role": "user", "content": "Design a distributed caching system."}
]
)
This combination gives you the best of both worlds: Claude uses thinking only when necessary, and the effort level controls how deeply it thinks.
Effort vs. budget_tokens (Deprecated)
If you've been using budget_tokens to control thinking depth on Opus 4.6 or Sonnet 4.6, it's time to migrate. The effort parameter replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted on these models, it is deprecated and will be removed in a future model release.
- It affects all tokens, not just thinking tokens
- It's a behavioral signal, not a strict budget — Claude adapts to problem difficulty
- It works with or without extended thinking enabled
- It gives finer control over tool call frequency
Best Practices
- Start with medium effort for most applications. It's the best balance of speed, cost, and performance.
- Use low effort for subagents that handle simple, well-defined tasks.
- Use high or max effort for the main agent in complex workflows.
- Combine with adaptive thinking to let Claude decide when to think deeply.
- Monitor token usage and adjust effort based on your cost and latency requirements.
- Test different effort levels with your specific use case — the optimal setting depends on your task complexity.
Key Takeaways
- The effort parameter controls token spending across all response types — text, tool calls, and thinking — giving you fine-grained control over cost and speed.
- Medium effort is the recommended default for most production applications, balancing capability with efficiency.
- Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6; budget_tokens is deprecated and will be removed.
- Combine effort with adaptive thinking for the best experience — Claude will think deeply on hard problems and skip thinking on simple ones.
- Different effort levels suit different use cases: low for speed, medium for balance, high for quality, max for deepest reasoning, and xhigh for long-horizon tasks on Opus 4.7.