Mastering Claude’s Effort Parameter: Control Token Spend Without Sacrificing Intelligence
Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost across models like Opus 4.6, Sonnet 4.6, and Mythos Preview.
This guide explains Claude’s effort parameter—a behavioral signal that controls how eagerly Claude spends tokens. You’ll learn how to set effort levels (low, medium, high, xhigh, max) to trade off between response quality and speed/cost, with practical API examples and recommended defaults for Sonnet 4.6.
Introduction
When building applications with Claude, you often face a classic trade-off: response quality vs. speed and cost. Do you let Claude think deeply and spend more tokens, or do you push for faster, cheaper responses? Traditionally, you had to switch models or manually set token budgets. Now, with the effort parameter, you can control this balance using a single model—no model swapping required.
Effort is a behavioral signal that tells Claude how eager it should be about spending tokens. It affects everything from text responses and tool calls to extended thinking. This guide will walk you through how effort works, when to use each level, and how to combine it with adaptive thinking for the best results.
How the Effort Parameter Works
By default, Claude uses high effort—spending as many tokens as needed for excellent results. You can raise the level to max for absolute top capability, or lower it to medium or low to save tokens and reduce latency.
Key points:
- Effort affects all tokens in the response, including text, tool calls, and extended thinking.
- It does not require thinking to be enabled.
- Lower effort means Claude will make fewer tool calls and write shorter responses.
- Effort is a behavioral signal, not a strict token budget. On difficult problems, Claude will still think—just less than at higher levels.
Effort Levels Overview
| Level | Description | Typical Use Case |
|---|---|---|
low | Most efficient. Significant token savings with some capability reduction. | Simple tasks, subagents, high-volume chat |
medium | Balanced approach with moderate token savings. | Agentic tasks needing speed and cost balance |
high | High capability. Equivalent to not setting the parameter. | Complex reasoning, coding, agentic tasks |
xhigh | Extended capability for long-horizon work. Available on Opus 4.7 only. | Long-running agentic/coding tasks (>30 min) |
max | Absolute maximum capability with no constraints. Available on select models. | Deepest reasoning, most thorough analysis |
When to Use Each Effort Level
Low Effort
Uselow when you need the fastest possible responses and can accept some reduction in quality. Ideal for:
- High-volume chat applications
- Simple Q&A or classification tasks
- Subagents that handle straightforward subtasks
Medium Effort
medium is the sweet spot for most production applications. It offers a good balance of speed, cost, and performance. Recommended for:
- Agentic coding workflows
- Tool-heavy applications
- Code generation where latency matters
High Effort (Default)
Stick withhigh when you need Claude’s full reasoning power. This is the default behavior, so you don’t need to set it explicitly. Use for:
- Complex problem-solving
- Difficult coding tasks
- Multi-step agentic workflows
Xhigh Effort (Opus 4.7 Only)
xhigh is designed for long-running tasks that require sustained deep reasoning over millions of tokens. Available only on Claude Opus 4.7.
Max Effort
max removes all constraints on token spending, giving you Claude’s absolute best performance. Available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.
Recommended Effort Levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency, always set effort explicitly when using this model:
- Medium (recommended default): Best balance for most applications—agentic coding, tool-heavy workflows, code generation.
- Low: For high-volume or latency-sensitive workloads—chat, non-coding use cases.
- High: For tasks requiring maximum reasoning depth.
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking allows Claude to dynamically decide how much to think based on the complexity of the request. When paired with effort, you get fine-grained control:
- At high or max effort, Claude will almost always think.
- At lower effort levels, Claude may skip thinking for simpler problems, saving tokens.
Note: For Claude Opus 4.6 and Sonnet 4.6, effort replacesbudget_tokensas the recommended way to control thinking depth. Whilebudget_tokensis still accepted, it is deprecated and will be removed in a future release.
Practical API Examples
Python Example: Setting Effort
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="medium", # Explicitly set effort level
messages=[
{"role": "user", "content": "Write a Python function to merge two sorted lists."}
]
)
print(response.content[0].text)
TypeScript Example: Setting Effort
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
effort: 'medium',
messages: [
{ role: 'user', content: 'Write a Python function to merge two sorted lists.' }
]
});
console.log(response.content[0].text);
Example with Adaptive Thinking
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
effort="high",
thinking={"type": "adaptive"},
messages=[
{"role": "user", "content": "Solve this complex math problem step by step: ..."}
]
)
Best Practices
- Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency.
- Start with medium for most production applications, then adjust based on observed performance.
- Use low effort for subagents that handle simple, well-defined tasks.
- Combine with adaptive thinking for optimal token efficiency on mixed-complexity workloads.
- Monitor token usage across effort levels to find the right balance for your use case.
Limitations and Considerations
- Effort is a behavioral signal, not a hard budget. Claude may still think deeply on difficult problems even at low effort.
xhighis only available on Claude Opus 4.7.maxis not available on all models—check the documentation for your specific model.- Effort does not replace the need for
max_tokens—always set a reasonable token limit.
Key Takeaways
- Effort controls token spend across all response types (text, tool calls, thinking) without changing models.
- Five levels are available:
low,medium,high,xhigh, andmax, each suited to different use cases. - Always set effort explicitly on Sonnet 4.6 to avoid defaulting to high effort unexpectedly.
- Combine with adaptive thinking for the best balance of depth and efficiency.
- Effort replaces
budget_tokenson Opus 4.6 and Sonnet 4.6—migrate your code to use effort instead.