Mastering Claude's Effort Parameter: Balance Performance, Speed, and Cost
Learn how to use Claude's effort parameter to control token spending, response thoroughness, and latency across different models for optimal API performance.
This guide explains how to use Claude's effort parameter to control token spending and response depth. You'll learn the five effort levels (low, medium, high, xhigh, max), when to use each, and how to combine effort with adaptive thinking for optimal API performance.
Introduction
Claude's effort parameter gives you fine-grained control over how much "thinking" your model does before responding. By adjusting effort, you can trade off between response thoroughness and token efficiency — all with a single model, without switching to a smaller or larger version.
This feature is a game-changer for developers who want to optimize cost and latency while maintaining high-quality outputs. Whether you're building a simple chatbot or a complex agentic system, understanding effort will help you get the most out of Claude.
What Is the Effort Parameter?
The effort parameter controls how eager Claude is about spending tokens when responding to requests. It affects all tokens in the response, including:
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
Supported Models
The effort parameter is available on:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.
Effort Levels Explained
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability with no constraints on token spending | Deepest possible reasoning, most thorough analysis |
xhigh | Extended capability for long-horizon work | Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions |
high | High capability. Equivalent to not setting the parameter. | Complex reasoning, difficult coding problems, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks that require a balance of speed, cost, and performance |
low | Most efficient. Significant token savings with some capability reduction. | Simpler tasks that need the best speed and lowest costs, such as subagents |
Note:maxis available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6.xhighis available only on Claude Opus 4.7.
How to Use Effort in the API
Basic Usage (Python)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
effort="medium", # Control token spending
messages=[
{"role": "user", "content": "Write a detailed analysis of quantum computing's impact on cryptography."}
]
)
print(response.content[0].text)
Basic Usage (TypeScript)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 8192,
effort: 'medium',
messages: [
{ role: 'user', content: 'Write a detailed analysis of quantum computing\'s impact on cryptography.' }
]
});
console.log(response.content[0].text);
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking. This allows Claude to dynamically decide how much thinking to use based on the problem complexity, while respecting your effort preference.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
thinking={"type": "adaptive"}, # Enable adaptive thinking
effort="medium", # Control overall token spend
messages=[
{"role": "user", "content": "Debug this complex Python code and explain the fix..."}
]
)
Recommended Effort Levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set effort when using this model:
- Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed is critical.
Practical Scenarios
Scenario 1: High-Volume Customer Support Chat
For a chatbot handling simple FAQs, use low effort to minimize latency and cost:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="low",
messages=[
{"role": "user", "content": "What are your business hours?"}
]
)
Scenario 2: Complex Code Review Agent
For a code review agent that needs deep analysis, use high or max effort:
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=16384,
effort="high",
thinking={"type": "adaptive"},
messages=[
{"role": "user", "content": "Review this pull request for security vulnerabilities and performance issues..."}
]
)
Scenario 3: Long-Running Agentic Task
For tasks that run over 30 minutes with token budgets in the millions, use xhigh (Opus 4.7 only):
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=200000,
effort="xhigh",
thinking={"type": "adaptive"},
messages=[
{"role": "user", "content": "Refactor this entire codebase to use TypeScript with proper types and tests..."}
]
)
Effort vs. Token Budget
| Aspect | Effort | budget_tokens |
|---|---|---|
| Type | Behavioral signal | Strict token limit |
| Flexibility | Adapts to problem difficulty | Fixed cap |
| Thinking required | No | Yes |
| Tool calls affected | Yes | No |
| Future support | Active development | Deprecated |
- Doesn't require thinking to be enabled
- Affects all token spend, including tool calls
- Adapts intelligently to problem difficulty
Best Practices
- Start with medium effort for most applications, then adjust based on observed performance and cost.
- Combine with adaptive thinking for optimal results — let Claude decide when to think deeply.
- Set effort explicitly when using Sonnet 4.6 to avoid default high latency.
- Monitor token usage across different effort levels to find your sweet spot.
- Use low effort for subagents and simple tasks where speed matters more than depth.
- Reserve max effort for the most challenging problems that require absolute best performance.
Key Takeaways
- Effort controls token spending across all response types (text, tools, thinking) without requiring thinking to be enabled.
- Five levels available: low, medium, high, xhigh (Opus 4.7), and max — each offering a different trade-off between capability and efficiency.
- Combine with adaptive thinking (
thinking: {type: "adaptive"}) for the best balance of performance and cost. - Explicitly set effort when using Sonnet 4.6 to avoid unexpected latency from the default high setting.
- Effort replaces budget_tokens as the recommended way to control thinking depth on Opus 4.6 and Sonnet 4.6.