Mastering Claude's Effort Parameter: Balance Performance and Cost in Your API Calls
Learn how to use Claude's effort parameter to control token spending, optimize response thoroughness, and reduce costs across all API interactions including tool calls and extended thinking.
This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn the five effort levels (low, medium, high, xhigh, max), how to implement them in API calls, and practical strategies for balancing performance with cost across different use cases.
Introduction
When building applications with Claude, one of the most important decisions you'll make is how to balance response quality against token usage and cost. The effort parameter gives you fine-grained control over this trade-off, allowing you to dial Claude's thoroughness up or down with a single API parameter.
Introduced alongside Claude Opus 4.7 and Sonnet 4.6, effort replaces the older budget_tokens parameter as the recommended way to control thinking depth. It works across all response types—text, tool calls, and extended thinking—giving you unprecedented control over your API costs.
In this guide, you'll learn:
- What the effort parameter does and how it differs from token budgets
- The five effort levels and when to use each
- How to implement effort in your API calls with code examples
- Best practices for Sonnet 4.6 and other supported models
How the Effort Parameter Works
By default, Claude operates at high effort—spending as many tokens as needed to produce excellent results. The effort parameter lets you adjust this behavior:
- Raise effort to
maxfor the absolute highest capability - Lower effort to
mediumorlowfor faster, cheaper responses
Key Advantages
- No thinking required: Effort works even when extended thinking is disabled
- Affects all token spend: Including tool calls—lower effort means fewer tool calls
- Single parameter control: One setting influences text, thinking, and tool usage
Effort Levels Explained
Claude supports five effort levels, each suited to different use cases:
| Level | Description | Typical Use Case |
|---|---|---|
low | Most efficient. Significant token savings with some capability reduction. | Simple tasks, high-volume chat, subagents |
medium | Balanced approach with moderate token savings. | Agentic tasks needing speed/cost balance |
high | High capability. Equivalent to omitting the parameter. | Complex reasoning, coding, agentic tasks |
xhigh | Extended capability for long-horizon work. | Long-running agentic/coding tasks (30+ min) |
max | Absolute maximum capability with no constraints. | Deepest reasoning, most thorough analysis |
max is available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6. xhigh is available only on Opus 4.7.
Implementing Effort in Your API Calls
Basic Usage
Add the effort parameter to your Messages API request. Here's an example using Python:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
effort="low", # Options: "low", "medium", "high", "xhigh", "max"
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
)
print(response.content[0].text)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 8192,
effort: 'medium',
messages: [
{ role: 'user', content: 'Write a Python function to sort a list of dictionaries.' }
]
});
console.log(response.content[0].text);
Combining with Extended Thinking
For maximum control, combine effort with adaptive thinking:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16384,
thinking={"type": "adaptive"},
effort="high",
messages=[
{"role": "user", "content": "Solve this complex math problem step by step."}
]
)
Using Effort with Tool Calls
Effort affects tool call frequency. Lower effort means Claude will make fewer, more targeted tool calls:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
effort="low", # Fewer tool calls, faster responses
tools=[
{
"name": "search_database",
"description": "Search the company database",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
],
messages=[
{"role": "user", "content": "Find all customers who purchased in the last month."}
]
)
Recommended Settings for Sonnet 4.6
Sonnet 4.6 defaults to high effort. For optimal results, explicitly set the effort level:
- Medium effort (recommended default): Best balance for most applications—agentic coding, tool-heavy workflows, code generation
- Low effort: For high-volume or latency-sensitive workloads—chat, non-coding use cases where speed matters
# Recommended for most applications
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
effort="medium", # Explicitly set to avoid unexpected latency
messages=[...]
)
Practical Strategies
1. Tier Your Effort by Task Complexity
Create a simple mapping based on task difficulty:
def get_effort_level(task_type):
if task_type == "simple_chat":
return "low"
elif task_type == "code_generation":
return "medium"
elif task_type == "complex_reasoning":
return "high"
elif task_type == "deep_research":
return "max"
else:
return "high" # Default
2. Use Effort for Cost Optimization
For production systems, start with medium and monitor response quality. Only increase to high or max when you observe quality degradation.
3. Combine with Adaptive Thinking
Adaptive thinking (thinking: {type: "adaptive"}) automatically adjusts thinking depth based on the problem. Combined with effort, you get two layers of optimization:
- Effort controls overall token spend
- Adaptive thinking fine-tunes thinking depth per request
Important Notes
- Zero Data Retention: The effort feature is eligible for ZDR. When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.
- Deprecation:
budget_tokensis deprecated on Opus 4.6 and Sonnet 4.6. Use effort instead. - Behavioral signal: Effort is not a strict token budget. Claude may still think deeply on hard problems even at low effort.
Key Takeaways
- Effort replaces budget_tokens as the recommended way to control thinking depth on Opus 4.6 and Sonnet 4.6
- Five levels available: low, medium, high, xhigh (Opus 4.7 only), and max
- Affects all token spend: Text, thinking, and tool calls are all influenced by the effort setting
- Explicitly set effort for Sonnet 4.6 to avoid unexpected latency—medium is recommended as the default
- Combine with adaptive thinking for the best balance of performance and cost efficiency