Mastering Claude's Effort Parameter: Balance Performance and Cost
Learn how to use Claude's effort parameter to control token spending, response thoroughness, and API costs. Includes code examples and best practices for all effort levels.
This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn how to set effort levels from 'low' to 'max' to balance speed, cost, and capability for different use cases.
Introduction
Claude's effort parameter is a powerful new tool that gives you fine-grained control over how many tokens your model uses when responding to requests. Whether you're building a high-volume chat application, a complex agentic system, or a cost-sensitive tool, understanding effort is key to getting the most out of Claude.
This guide covers everything you need to know: how effort works, the different levels available, practical code examples, and recommended configurations for common scenarios.
What Is the Effort Parameter?
The effort parameter controls how "eager" Claude is about spending tokens when generating responses. By default, Claude uses high effort, which means it will spend as many tokens as needed for excellent results. You can raise this to max for the absolute highest capability, or lower it to low for maximum speed and cost savings.
Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think deeply on difficult problems—just less than it would at higher levels.
Supported Models
The effort parameter is available on:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
budget_tokens parameter as the recommended way to control thinking depth.
How Effort Affects Responses
The effort parameter influences all tokens in the response, including:
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
Effort Levels Explained
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability, no constraints on token spending | Deepest reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6) |
xhigh | Extended capability for long-horizon work | Long-running agentic/coding tasks over 30 minutes (Opus 4.7 only) |
high | High capability (default) | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing speed/cost/performance balance |
low | Most efficient, significant token savings | Simple tasks, subagents, high-volume chat |
Note: Setting effort to "high" produces exactly the same behavior as omitting the parameter entirely.
Recommended Effort for Sonnet 4.6
Sonnet 4.6 defaults to high effort. For most applications, you should explicitly set the effort level to avoid unexpected latency:
- Medium effort (recommended default): Best balance of speed, cost, and performance for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads like chat and non-coding use cases.
Code Examples
Python (using the Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
Low effort for simple, fast responses
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "What is the capital of France?"}],
# highlight-next-line
effort="low"
)
print(response.content[0].text)
# Medium effort for balanced performance
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a coding assistant.",
messages=[{"role": "user", "content": "Write a Python function to sort a list of dictionaries by a key."}],
# highlight-next-line
effort="medium"
)
# Max effort for deep reasoning
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=8192,
system="You are a research scientist.",
messages=[{"role": "user", "content": "Analyze the implications of quantum computing on cryptography."}],
# highlight-next-line
effort="max"
)
TypeScript (using the Anthropic SDK)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Low effort for fast, simple responses
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
system: 'You are a helpful assistant.',
messages: [{ role: 'user', content: 'What is the capital of France?' }],
// highlight-next-line
effort: 'low'
});
console.log(response.content[0].text);
// Medium effort for balanced performance
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
system: 'You are a coding assistant.',
messages: [{ role: 'user', content: 'Write a Python function to sort a list of dictionaries by a key.' }],
// highlight-next-line
effort: 'medium'
});
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
thinking={"type": "adaptive"}, # Enable adaptive thinking
messages=[{"role": "user", "content": "Solve this complex math problem..."}],
effort="medium"
)
Adaptive thinking allows Claude to decide how much thinking to do based on the problem difficulty, while effort sets the overall ceiling.
Practical Use Cases
1. High-Volume Customer Support Chat
Use low effort for simple FAQ responses where speed is critical:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=512,
messages=[{"role": "user", "content": "What are your business hours?"}],
effort="low"
)
2. Agentic Coding Assistant
Use medium effort as your default for tool-heavy coding workflows:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[...], # Your tool definitions
messages=[{"role": "user", "content": "Refactor this module to use async/await."}],
effort="medium"
)
3. Deep Research Analysis
Use max effort on Opus 4.7 for the most thorough reasoning:
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=16384,
messages=[{"role": "user", "content": "Compare and contrast the economic policies of..."}],
effort="max"
)
Best Practices
- Always set effort explicitly with Sonnet 4.6 to avoid unexpected latency.
- Start with medium for most agentic and coding tasks, then adjust based on observed performance.
- Use low effort for subagents in multi-agent systems where each subagent handles simple, well-defined tasks.
- Combine with adaptive thinking for optimal cost-performance tradeoffs.
- Monitor token usage across effort levels to understand your cost profile.
Key Takeaways
- The
effortparameter controls token spending across all response types (text, tool calls, thinking). - Five levels are available:
low,medium,high(default),xhigh(Opus 4.7 only), andmax. - Lower effort reduces capability but improves speed and cost; higher effort does the opposite.
- For Sonnet 4.6, use
mediumas your recommended default for most applications. - Combine effort with adaptive thinking for the best balance of performance and efficiency.
- Effort replaces the deprecated
budget_tokensparameter on Opus 4.6 and Sonnet 4.6.