Mastering Claude's Effort Parameter: Control Token Spend and Response Depth
Learn how to use Claude's effort parameter to balance response thoroughness, token efficiency, and cost across different models like Opus 4.6, Sonnet 4.6, and Mythos Preview.
Claude's effort parameter lets you control how eagerly the model spends tokens on responses. Set it to 'low' for fast, cheap answers on simple tasks, 'medium' for balanced agentic work, 'high' for complex reasoning, or 'max' for absolute capability. It works without thinking enabled and affects all tokens including tool calls.
Introduction
Claude is incredibly capable, but sometimes you don't need the full force of its reasoning engine. Maybe you're building a high-volume chatbot where speed matters more than depth, or you're running a complex agent that needs to be cost-efficient over long sessions. Enter the effort parameter — a simple but powerful tool that lets you dial Claude's token spending up or down, all with a single model.
In this guide, you'll learn exactly how effort works, when to use each level, and how to combine it with adaptive thinking for the best results. We'll cover practical code examples, recommended settings for popular models, and common pitfalls to avoid.
What Is the Effort Parameter?
The effort parameter controls how eager Claude is about spending tokens when responding to requests. By default, Claude uses high effort — it spends as many tokens as needed for excellent results. But you can lower it to save tokens (and money) or raise it to max for the absolute highest capability.
Key advantages:
- No thinking required: Effort works even when extended thinking is disabled.
- Affects all tokens: Text responses, tool calls, function arguments, and thinking tokens all scale with effort.
- Behavioral signal, not a strict budget: Claude will still think hard on difficult problems, just less than at higher levels.
Effort Levels Explained
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability, no constraints | Deepest reasoning, complex research |
xhigh | Extended capability for long-horizon work | Long-running agentic/coding tasks (>30 min) |
high (default) | High capability, equivalent to omitting the parameter | Complex reasoning, difficult coding, agents |
medium | Balanced approach with moderate token savings | Agentic tasks needing speed/cost balance |
low | Most efficient, significant token savings | Simple tasks, subagents, high-volume chat |
Note:maxis available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.xhighis currently only on Opus 4.7.
How to Use Effort in the API
Basic Usage (Python)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "Explain quantum entanglement in simple terms."}
],
# Set effort to low for a quick, concise answer
effort="low"
)
print(response.content[0].text)
With Extended Thinking (TypeScript)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-opus-4-20250514',
max_tokens: 4096,
system: 'You are a research assistant.',
messages: [
{ role: 'user', content: 'Analyze the long-term economic impacts of AI automation.' }
],
thinking: {
type: 'enabled',
budget_tokens: 2048
},
// High effort for thorough analysis
effort: 'high'
});
console.log(response.content);
Combining with Adaptive Thinking
For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This lets Claude dynamically decide how much to think based on the problem complexity, while effort sets the overall ceiling.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{"role": "user", "content": "Solve this calculus problem step by step."}],
thinking={"type": "adaptive"},
effort="medium"
)
Recommended Settings by Model
Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency, always set effort explicitly:
medium(recommended default): Best balance for agentic coding, tool-heavy workflows, code generation.low: High-volume or latency-sensitive workloads (chat, non-coding).high: Tasks requiring deeper reasoning.
Opus 4.6
Opus 4.6 also defaults to high. Use:
highfor complex analysis and research.maxfor the deepest possible reasoning (available on Opus 4.6+).
Deprecation note:budget_tokensis still accepted on Opus 4.6 and Sonnet 4.6 but is deprecated. Useeffortinstead.
Practical Scenarios
Scenario 1: High-Volume Customer Support Chat
Set effort to low for fast, cost-effective responses to common questions. Claude will skip unnecessary thinking and produce concise answers.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
messages=[{"role": "user", "content": "What are your business hours?"}],
effort="low"
)
Scenario 2: Complex Code Generation Agent
Use medium or high effort. The agent will make more tool calls and produce more thorough code, but won't overthink simple steps.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{"role": "user", "content": "Build a REST API for a todo app with Flask."}],
tools=[
{
"name": "write_file",
"description": "Write code to a file",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string"},
"content": {"type": "string"}
},
"required": ["path", "content"]
}
}
],
effort="medium"
)
Scenario 3: Deep Research with Max Effort
For tasks requiring the deepest possible reasoning, set effort to max and enable extended thinking.
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=8192,
messages=[{"role": "user", "content": "Compare the philosophical implications of quantum mechanics vs. general relativity."}],
thinking={"type": "enabled", "budget_tokens": 4096},
effort="max"
)
Best Practices
- Start with medium for most tasks: It's the sweet spot for cost and capability.
- Use low for simple, high-volume workloads: Great for subagents and chat.
- Reserve high/max for complex reasoning: Don't waste tokens on trivial requests.
- Combine with adaptive thinking: Let Claude decide when to think deeply.
- Always set effort explicitly on Sonnet 4.6: Avoid unexpected latency.
Common Pitfalls
- Assuming low effort means no thinking: Claude will still think on difficult problems, just less.
- Using max effort for simple tasks: You'll burn tokens unnecessarily.
- Forgetting to set effort on Sonnet 4.6: It defaults to
high, which may be slower than expected. - Mixing effort with budget_tokens: On Opus 4.6 and Sonnet 4.6, use
effortinstead of the deprecatedbudget_tokens.
Conclusion
The effort parameter is a versatile tool for controlling Claude's behavior. Whether you need lightning-fast responses for a chatbot or deep reasoning for research, you can tune it to match your exact needs. By combining effort with adaptive thinking and choosing the right level for your use case, you'll get the best performance and cost efficiency from Claude.
Key Takeaways
- Effort controls token spend across all response types — text, tool calls, and thinking — without requiring extended thinking to be enabled.
- Use
lowfor speed/cost savings on simple tasks,mediumfor balanced agentic work, andhigh/maxfor deep reasoning. - Always set effort explicitly on Sonnet 4.6 to avoid defaulting to
highand incurring unexpected latency. - Combine effort with adaptive thinking (
thinking: {type: "adaptive"}) for the best balance of depth and efficiency. budget_tokensis deprecated on Opus 4.6 and Sonnet 4.6 — migrate to theeffortparameter.