Mastering Claude's Effort Parameter: Balance Performance, Speed, and Cost
Learn how to use Claude's effort parameter to control token spending, optimize response thoroughness, and reduce costs across all supported models.
The effort parameter lets you control how many tokens Claude spends on responses, from max (deepest reasoning) to low (fastest, cheapest). Use it to trade off capability for speed and cost without switching models.
Mastering Claude's Effort Parameter: Balance Performance, Speed, and Cost
Claude's effort parameter is a powerful new tool that gives you fine-grained control over how many tokens your model uses when responding to requests. Instead of switching between different Claude models to balance capability and cost, you can now adjust a single parameter to get exactly the behavior you need—from lightning-fast answers to deep, multi-step reasoning.
This guide explains everything you need to know about the effort parameter: how it works, when to use each level, and practical code examples to implement it today.
What Is the Effort Parameter?
The effort parameter (effort) controls how "eager" Claude is about spending tokens when generating responses. It affects all tokens in the output, including:
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking tokens (when enabled)
budget_tokens, which only controlled thinking depth. The effort parameter gives you a single, unified dial for token efficiency across your entire application.
Important: For Claude Opus 4.6 and Sonnet 4.6, effort replacesbudget_tokensas the recommended way to control thinking depth. Whilebudget_tokensis still accepted, it is deprecated and will be removed in a future model release.
Supported Models
The effort parameter is generally available on all supported models with no beta header required. Currently supported models include:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
Effort Levels Explained
Claude offers five effort levels, each suited to different use cases:
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability, no token constraints | Deepest reasoning, most thorough analysis |
xhigh | Extended capability for long-horizon work | Long-running agentic/coding tasks (30+ min) |
high | High capability (default behavior) | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing speed/cost balance |
low | Most efficient, significant token savings | Simple tasks, subagents, high-volume chat |
How to Use the Effort Parameter
Basic API Usage (Python)
import anthropic
client = anthropic.Anthropic()
Low effort: fastest, cheapest
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="low", # Options: "low", "medium", "high", "xhigh", "max"
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
)
print(response.content[0].text)
TypeScript / Node.js Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
async function main() {
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
effort: 'medium',
messages: [
{ role: 'user', content: 'Write a Python script to scrape a website.' }
]
});
console.log(response.content[0].text);
}
main();
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This allows Claude to dynamically decide when to use extended thinking based on the complexity of the task.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
effort="high",
thinking={"type": "adaptive"},
messages=[
{"role": "user", "content": "Solve this complex math problem step by step."}
]
)
Recommended Effort Levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency, always explicitly set the effort level when using this model:
- Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.
- High effort: For tasks requiring maximum capability, such as complex reasoning or difficult coding problems.
Practical Use Cases
1. Cost-Optimized Customer Support Chat
Use low effort for routine queries and medium for escalated issues:
def handle_support_request(query, is_complex=False):
effort_level = "medium" if is_complex else "low"
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=512,
effort=effort_level,
messages=[{"role": "user", "content": query}]
)
return response.content[0].text
2. Multi-Agent Systems
Route sub-tasks to low-effort agents and complex reasoning to high-effort agents:
class Agent:
def __init__(self, role, effort_level):
self.role = role
self.effort = effort_level
def process(self, task):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
effort=self.effort,
system=f"You are a {self.role} agent.",
messages=[{"role": "user", "content": task}]
)
return response.content[0].text
Create agents with different effort levels
fast_agent = Agent("data collector", "low")
reasoning_agent = Agent("analyst", "high")
Use accordingly
raw_data = fast_agent.process("Fetch latest stock prices")
analysis = reasoning_agent.process(f"Analyze this data: {raw_data}")
3. Token Budget Management
Track token usage across effort levels to optimize costs:
def query_with_effort(prompt, effort_level):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
effort=effort_level,
messages=[{"role": "user", "content": prompt}]
)
usage = response.usage
print(f"Effort: {effort_level}")
print(f"Input tokens: {usage.input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
return response.content[0].text
Best Practices
- Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency.
- Combine with adaptive thinking for optimal results—Claude will decide when to think deeply.
- Start with
mediumfor most applications, then adjust up or down based on observed performance. - Use
lowfor high-volume, latency-sensitive tasks like chat or simple Q&A. - Reserve
maxandxhighfor tasks that genuinely require the deepest reasoning, such as complex code generation or multi-step analysis.
Limitations and Considerations
- Effort is a behavioral signal, not a strict token budget. Claude may still use significant tokens for difficult problems even at low effort.
- At
maxeffort, token usage can be very high. Monitor your costs carefully. - The
xhighlevel is currently only available on Claude Opus 4.7. - When using
loweffort, expect some reduction in response quality for complex tasks.
Key Takeaways
- The effort parameter gives you granular control over Claude's token spending, affecting all output types including text, tool calls, and thinking tokens.
- Five effort levels let you trade off between capability and efficiency:
low,medium,high,xhigh, andmax. - Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the default
highsetting. - Combine effort with adaptive thinking for the best balance of performance and cost.
- Use lower effort levels for simple tasks and higher levels for complex reasoning to optimize your token budget without switching models.