Mastering Claude's Effort Parameter: Balance Performance and Cost
Learn how to use Claude's effort parameter to control token spending, optimize response thoroughness, and reduce costs across all API interactions.
This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn the five effort levels (low, medium, high, xhigh, max), when to use each, and how to implement them in your API calls to balance performance and cost.
Mastering Claude's Effort Parameter: Balance Performance and Cost
When building applications with Claude, one of the most powerful yet underutilized controls is the effort parameter. This feature lets you dial in exactly how much computational resource—and therefore cost—Claude dedicates to each request. Whether you're building a high-volume chatbot or a deep reasoning agent, understanding effort is key to optimizing both performance and budget.
What Is the Effort Parameter?
The effort parameter controls how eager Claude is about spending tokens when responding to requests. It's a behavioral signal that influences all tokens in the response, including:
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
Supported Models
The effort parameter is available on:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.
Effort Levels Explained
There are five effort levels, each suited to different use cases:
| Level | Description | Typical Use Case |
|---|---|---|
| max | Absolute maximum capability with no constraints on token spending | Deepest possible reasoning, most thorough analysis |
| xhigh | Extended capability for long-horizon work (Opus 4.7 only) | Long-running agentic and coding tasks (30+ minutes) with token budgets in the millions |
| high | High capability. Equivalent to not setting the parameter. | Complex reasoning, difficult coding problems, agentic tasks |
| medium | Balanced approach with moderate token savings | Agentic tasks requiring a balance of speed, cost, and performance |
| low | Most efficient. Significant token savings with some capability reduction. | Simpler tasks needing best speed and lowest costs, such as subagents |
Important: Setting effort to "high" produces exactly the same behavior as omitting the effort parameter entirely.
How Effort Works Under the Hood
The effort parameter affects all token spend, including tool calls. For example, at lower effort levels, Claude will make fewer tool calls. This gives you much greater control over efficiency compared to older methods like budget_tokens.
At high (default) and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems, saving tokens and reducing latency.
Practical Implementation
Basic Usage in Python
import anthropic
client = anthropic.Anthropic()
Low effort for high-volume, simple tasks
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
effort="low" # Fast and cheap for simple questions
)
print(response.content[0].text)
Medium Effort for Balanced Performance
# Medium effort for agentic coding tasks
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a senior software engineer.",
messages=[
{"role": "user", "content": "Write a Python function to merge two sorted lists."}
],
effort="medium"
)
Max Effort for Deep Reasoning
# Max effort for complex analysis
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=8192,
system="You are a research scientist.",
messages=[
{"role": "user", "content": "Analyze the implications of quantum computing on current encryption standards."}
],
effort="max"
)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
async function getResponse() {
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
system: 'You are a helpful assistant.',
messages: [
{ role: 'user', content: 'Summarize this article.' }
],
effort: 'low'
});
console.log(response.content[0].text);
}
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
thinking={"type": "adaptive"},
messages=[
{"role": "user", "content": "Solve this complex math problem step by step."}
],
effort="medium"
)
Adaptive thinking allows Claude to dynamically decide when to engage extended thinking, while effort controls the overall token budget. This combination is particularly powerful for applications that handle a mix of simple and complex queries.
Recommended Effort Levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set effort when using this model:
- Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.
When to Use Each Level
Low Effort
- Best for: Simple Q&A, basic classification, high-volume chatbots, subagents
- Benefits: Fastest response times, lowest cost
- Trade-off: Reduced capability on complex problems
Medium Effort
- Best for: Agentic tasks, coding assistance, tool-heavy workflows
- Benefits: Good balance of speed, cost, and performance
- Trade-off: May not be sufficient for the most complex reasoning
High Effort (Default)
- Best for: Complex reasoning, difficult coding problems, detailed analysis
- Benefits: High capability without constraints
- Trade-off: Higher token usage and cost
XHigh Effort (Opus 4.7 Only)
- Best for: Long-running agentic tasks (30+ minutes), tasks with token budgets in the millions
- Benefits: Extended capability for sustained reasoning
- Trade-off: Highest token usage
Max Effort
- Best for: Deepest possible reasoning, most thorough analysis
- Benefits: Absolute maximum capability
- Trade-off: No constraints on token spending
Common Pitfalls to Avoid
- Using high effort for simple tasks: This wastes tokens and increases latency without meaningful quality gains.
- Not setting effort explicitly on Sonnet 4.6: The default is
high, which may cause unexpected latency. - Assuming effort is a strict budget: Effort is a behavioral signal, not a hard limit. Claude may still think deeply on difficult problems even at low effort.
- Forgetting to combine with adaptive thinking: For maximum efficiency, use
thinking: {"type": "adaptive"}alongside effort.
Key Takeaways
- Effort controls token spend across all response types—text, tool calls, and extended thinking—giving you fine-grained control over cost and performance.
- Five levels exist:
low,medium,high,xhigh(Opus 4.7 only), andmax, each suited to different use cases. - Combine with adaptive thinking (
thinking: {"type": "adaptive"}) for the best balance of capability and efficiency. - Explicitly set effort on Sonnet 4.6 to avoid unexpected latency from the default
highsetting. - Effort is a behavioral signal, not a strict budget—Claude will still think deeply on hard problems even at lower levels, but it will think less than at higher levels.