Mastering Claude's Effort Parameter: Control Thinking Depth for Cost and Speed
Learn how to use Claude's effort parameter to control token spending, balance capability with efficiency, and optimize API costs across all supported models.
This guide explains how to use Claude's effort parameter to control how eagerly the model spends tokens, enabling you to trade off between response thoroughness and efficiency across all supported models.
Mastering Claude's Effort Parameter: Control Thinking Depth for Cost and Speed
Claude's effort parameter is a powerful new tool that gives you fine-grained control over how many tokens the model spends when responding to requests. Whether you're building a high-volume chatbot that needs lightning-fast replies or a deep reasoning agent tackling complex problems, the effort parameter lets you dial in the perfect balance of capability, speed, and cost—all with a single model.
In this guide, you'll learn exactly how effort works, when to use each level, and how to implement it in your API calls with practical code examples.
What Is the Effort Parameter?
The effort parameter controls how "eager" Claude is about spending tokens when generating responses. It affects all tokens in the output, including:
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
budget_tokens, which only affected thinking tokens. With effort, you get comprehensive control over the entire response.
Important: Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think deeply on sufficiently difficult problems—it just won't think as much as it would at higher levels.
Supported Models
The effort parameter is available on all supported models without any beta header. Currently supported models include:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future release.
Effort Levels Explained
Claude offers five effort levels, each suited to different use cases:
| Level | Description | Best For |
|---|---|---|
max | Absolute maximum capability with no constraints on token spending | Deepest possible reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6) |
xhigh | Extended capability for long-horizon work | Long-running agentic and coding tasks (over 30 min) with token budgets in the millions (Opus 4.7 only) |
high | High capability. Equivalent to not setting the parameter. | Complex reasoning, difficult coding problems, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing a balance of speed, cost, and performance |
low | Most efficient. Significant token savings with some capability reduction. | Simple tasks needing best speed and lowest costs, such as subagents |
high effort.
Recommended Settings for Sonnet 4.6
Sonnet 4.6 defaults to high effort, which can introduce unexpected latency. For most applications, explicitly set the effort level:
- Medium effort (recommended default): Best balance of speed, cost, and performance. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed matters most.
How to Use the Effort Parameter in the API
Basic Usage
Here's how to set the effort parameter in a standard API call:
Python (using the Anthropic SDK):import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
effort="medium", # Set effort level here
messages=[
{"role": "user", "content": "Explain the theory of relativity in simple terms."}
]
)
print(response.content[0].text)
TypeScript (using the Anthropic SDK):
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 8192,
effort: 'medium',
messages: [
{ role: 'user', content: 'Explain the theory of relativity in simple terms.' }
]
});
console.log(response.content[0].text);
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking. This allows Claude to dynamically decide when to use extended thinking based on the problem complexity:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
thinking={"type": "adaptive"},
effort="medium",
messages=[
{"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) dx"}
]
)
With adaptive thinking enabled, Claude will automatically engage extended thinking for difficult problems while skipping it for simpler ones—saving tokens without sacrificing quality.
Using Effort with Tool Calls
Effort also controls how many tool calls Claude makes. Lower effort means fewer, more targeted tool calls:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
effort="low",
tools=[
{
"name": "get_weather",
"description": "Get the current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
],
messages=[
{"role": "user", "content": "What's the weather like in Tokyo and New York?"}
]
)
With low effort, Claude might make a single tool call for both cities, while high or max effort would likely make separate calls for each.
Practical Use Cases
1. High-Volume Customer Support Chat
For a chatbot handling thousands of simple queries per minute, use low effort:
def handle_support_query(user_message):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="low",
system="You are a helpful customer support agent. Answer concisely.",
messages=[{"role": "user", "content": user_message}]
)
return response.content[0].text
2. Complex Code Generation Agent
For an agent that needs to reason deeply about architecture and edge cases, use max effort:
def generate_complex_feature(request):
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=16000,
effort="max",
thinking={"type": "adaptive"},
system="You are an expert software architect. Generate production-ready code with full error handling.",
messages=[{"role": "user", "content": request}]
)
return response.content[0].text
3. Balanced Subagent Orchestration
When building a multi-agent system, use medium effort for subagents to save costs while maintaining quality:
class SubAgent:
def __init__(self, role):
self.role = role
def process(self, task):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
effort="medium",
system=f"You are a {self.role} subagent. Complete tasks efficiently.",
messages=[{"role": "user", "content": task}]
)
return response.content[0].text
Best Practices
- Start with
mediumfor Sonnet 4.6: This gives you the best balance of speed, cost, and performance for most applications.
- Use adaptive thinking alongside effort: The combination gives Claude the flexibility to think deeply when needed while respecting your effort preference.
- Test different levels on your specific tasks: The optimal effort level depends on your use case. Run A/B tests to find the sweet spot.
- Monitor token usage: Track your token consumption across effort levels to understand the cost implications.
- Don't over-optimize for simple tasks: If your task is straightforward (e.g., translation, summarization),
loweffort is often sufficient.
Limitations and Considerations
- Effort is not a hard budget: At lower levels, Claude may still spend significant tokens on genuinely hard problems.
xhighis only available on Opus 4.7: For long-running agentic tasks, use Opus 4.7 withxhigheffort.- Zero Data Retention (ZDR): The effort parameter is eligible for ZDR. When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.
Key Takeaways
- The effort parameter controls all token spend—text, tool calls, and thinking—giving you comprehensive control over cost and speed.
- Five effort levels (
low,medium,high,xhigh,max) let you dial in the perfect balance for any task, from simple chatbots to deep reasoning agents. - Combine effort with adaptive thinking (
thinking: {"type": "adaptive"}) for the best experience, allowing Claude to dynamically decide when to engage extended thinking. - For Sonnet 4.6, explicitly set effort to avoid unexpected latency;
mediumis the recommended default for most applications. - Effort replaces
budget_tokenson Opus 4.6 and Sonnet 4.6—migrate your code to use the new parameter beforebudget_tokensis removed.