Mastering Claude's Effort Parameter: Balance Performance, Speed, and Cost
Learn how to use Claude's effort parameter to control token spending, response thoroughness, and cost across API calls. Includes code examples and best practices.
This guide explains Claude's effort parameter, which lets you control how many tokens Claude spends on responses. You'll learn the five effort levels (low, medium, high, xhigh, max), when to use each, and how to implement them in your API calls for optimal balance of speed, cost, and capability.
Introduction
Claude is incredibly capable, but sometimes you don't need its full reasoning power for every task. Maybe you're building a high-volume chatbot where speed matters more than deep analysis, or perhaps you're running complex agentic workflows that need maximum thoughtfulness. Enter the effort parameter — a powerful new tool that lets you dial Claude's token spending up or down, all with a single model.
This guide will walk you through everything you need to know about the effort parameter: what it does, the five effort levels, when to use each, and how to implement it in your API calls.
What Is the Effort Parameter?
The effort parameter controls how eager Claude is about spending tokens when responding to requests. It's a behavioral signal, not a strict token budget. At higher effort levels, Claude thinks more deeply and produces more thorough responses. At lower levels, it becomes more efficient — skipping unnecessary reasoning, making fewer tool calls, and using fewer tokens overall.
Key benefits:
- Works without extended thinking — You can use effort even when thinking is disabled.
- Affects all tokens — Including text, tool calls, and extended thinking (when enabled).
- Single model, multiple modes — No need to switch between different Claude models for different tasks.
Supported Models
The effort parameter is available on:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it's deprecated and will be removed in a future model release.
The Five Effort Levels
| Level | Description | Best For |
|---|---|---|
| low | Most efficient. Significant token savings with some capability reduction. | Simple tasks, high-volume chat, subagents, latency-sensitive workloads |
| medium | Balanced approach with moderate token savings. | Agentic tasks needing a balance of speed, cost, and performance |
| high | High capability. Equivalent to not setting the parameter. | Complex reasoning, difficult coding, agentic tasks |
| xhigh | Extended capability for long-horizon work. Available on Opus 4.7. | Long-running agentic and coding tasks (over 30 min) with million-token budgets |
| max | Absolute maximum capability with no constraints. Available on Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6. | Deepest possible reasoning, most thorough analysis |
Note: Setting effort to "high" produces exactly the same behavior as omitting the effort parameter entirely.
How Effort Works in Practice
At high (default) and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems — but it will still think on sufficiently difficult problems, just less than it would at higher effort levels.
This adaptive behavior is a key advantage: you don't have to guess whether a problem is "hard enough" to warrant thinking. Claude handles that decision automatically.
Recommended Defaults for Sonnet 4.6
Sonnet 4.6 defaults to high effort. Anthropic recommends explicitly setting effort to avoid unexpected latency:
- Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed is critical.
Code Examples
Python (using the Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
Low effort — fast and efficient
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "What is the capital of France?"}],
thinking={"type": "adaptive"},
effort="low"
)
print(response.content)
High effort — thorough reasoning
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Design a distributed caching system."}],
thinking={"type": "adaptive"},
effort="high"
)
print(response.content)
TypeScript (using the Anthropic SDK)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Medium effort — balanced
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 2048,
system: 'You are a helpful assistant.',
messages: [{ role: 'user', content: 'Explain the difference between TCP and UDP.' }],
thinking: { type: 'adaptive' },
effort: 'medium'
});
console.log(response.content);
// Max effort — maximum capability
const responseMax = await client.messages.create({
model: 'claude-opus-4-20250514',
max_tokens: 8192,
system: 'You are a helpful assistant.',
messages: [{ role: 'user', content: 'Prove the Riemann Hypothesis.' }],
thinking: { type: 'adaptive' },
effort: 'max'
});
console.log(responseMax.content);
cURL (raw API call)
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"system": "You are a helpful assistant.",
"messages": [{"role": "user", "content": "Summarize this article."}],
"thinking": {"type": "adaptive"},
"effort": "low"
}'
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking:
{
"thinking": {
"type": "adaptive"
},
"effort": "medium"
}
Adaptive thinking lets Claude decide how much thinking is needed for each request, while effort sets the overall willingness to spend tokens. Together, they give you fine-grained control over the cost-quality tradeoff.
Practical Use Cases
1. High-Volume Chatbot (Low Effort)
If you're building a customer support chatbot that handles thousands of simple queries per day, use effort: "low". You'll get fast responses and significantly lower token costs.
2. Code Generation Agent (Medium Effort)
For a coding assistant that generates functions, writes tests, or refactors code, effort: "medium" provides a great balance. Claude will still reason through complex logic but won't overthink simple tasks.
3. Research Assistant (High Effort)
When analyzing research papers, comparing arguments, or generating detailed reports, stick with effort: "high" (or omit the parameter entirely). You want Claude to be thorough.
4. Long-Running Agent (XHigh Effort)
For agents that run for 30+ minutes and consume millions of tokens (e.g., autonomous coding agents, data pipeline orchestrators), use effort: "xhigh" on Claude Opus 4.7. This gives Claude the headroom to maintain deep reasoning over long horizons.
5. Scientific Analysis (Max Effort)
When you need the absolute best possible answer — proving theorems, solving open problems, or generating novel insights — use effort: "max" on Claude Opus 4.7 or Mythos Preview. Be prepared for higher token costs.
Best Practices
- Start with medium effort — It's the sweet spot for most applications.
- Use adaptive thinking — Combine effort with
thinking: {type: "adaptive"}for optimal results. - Profile your workload — Test different effort levels on a representative sample of your requests to find the best cost-quality tradeoff.
- Set effort explicitly — Don't rely on defaults, especially with Sonnet 4.6, to avoid unexpected latency.
- Monitor token usage — Track your token spend per request to validate that effort is having the desired effect.
Key Takeaways
- The effort parameter lets you control Claude's token spending across five levels: low, medium, high, xhigh, and max.
- Lower effort = faster, cheaper responses with some capability reduction; higher effort = more thorough, more expensive responses.
- Effort works without extended thinking and affects all tokens, including tool calls.
- Combine effort with adaptive thinking for the best balance of cost, speed, and quality.
- Start with medium effort for most applications, then adjust based on your specific needs and observed performance.