Mastering Claude's Effort Parameter: Optimize Token Spend and Response Depth
Learn how to use Claude's effort parameter to control token spending, balance speed and capability, and optimize costs across API calls and agentic workflows.
This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn how to set effort levels from low to max to trade off between speed, cost, and response thoroughness, with practical code examples for the API.
Mastering Claude's Effort Parameter: Optimize Token Spend and Response Depth
When building with Claude, one of the most powerful yet underutilized controls is the effort parameter. This feature lets you dial in exactly how much "thinking" Claude does before responding, giving you fine-grained control over token consumption, latency, and output quality—all with a single model.
In this guide, you'll learn what the effort parameter is, how it works across different Claude models, and how to use it effectively in your API calls to balance performance and cost.
What Is the Effort Parameter?
The effort parameter is a behavioral signal that tells Claude how eagerly it should spend tokens when responding to requests. By default, Claude uses high effort, spending as many tokens as needed for excellent results. You can raise it to max for the absolute highest capability, or lower it to medium or low for faster, cheaper responses.
Key characteristics:
- Available on all supported models without any beta header
- Works with or without extended thinking enabled
- Affects all tokens in the response: text, tool calls, and thinking tokens
- Replaces the deprecated
budget_tokensparameter on Opus 4.6 and Sonnet 4.6
Effort Levels Explained
| Level | Description | Best For |
|---|---|---|
max | Absolute maximum capability, no constraints on token spending | Deep reasoning, complex analysis, research-grade tasks |
xhigh | Extended capability for long-horizon work (Opus 4.7 only) | Long-running agentic and coding tasks (>30 min) |
high | Default behavior, excellent results | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing speed/cost/performance balance |
low | Most efficient, significant token savings | Simple tasks, high-volume chat, subagents |
Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think on sufficiently difficult problems—it just thinks less than it would at higher levels for the same problem.
How Effort Works Under the Hood
When you set the effort parameter, Claude adjusts its internal reasoning process. At high and max effort, Claude almost always thinks before responding. At lower levels, it may skip thinking for simpler problems, jumping straight to an answer.
This affects:
- Text responses: Shorter, more direct answers at low effort; longer, more thorough explanations at high effort
- Tool calls: Fewer tool calls at low effort; more thorough tool usage at high effort
- Extended thinking: Deeper reasoning chains at higher effort levels
Using Effort with the API
Basic Usage (Python)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
# Set effort level
extra_headers={
"anthropic-effort": "low"
}
)
print(response.content[0].text)
Using Effort with Extended Thinking
For maximum capability, combine effort with adaptive thinking:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
thinking={"type": "adaptive"},
messages=[
{"role": "user", "content": "Design a distributed caching system for a global e-commerce platform."}
],
extra_headers={
"anthropic-effort": "max"
}
)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
async function getResponse() {
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
messages: [
{ role: 'user', content: 'Summarize this 50-page document.' }
],
extra_headers: {
'anthropic-effort': 'medium'
}
});
console.log(response.content[0].text);
}
Recommended Effort Levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort. For most applications, explicitly set effort to avoid unexpected latency:
- Medium (recommended default): Best balance for agentic coding, tool-heavy workflows, and code generation
- Low: For high-volume or latency-sensitive workloads like chat and simple Q&A
Practical Scenarios
Scenario 1: High-Volume Customer Support Chat
Use low effort for simple, repetitive queries where speed matters more than depth:
def handle_support_query(user_message):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": user_message}],
extra_headers={"anthropic-effort": "low"}
)
return response.content[0].text
Scenario 2: Complex Code Review
Use max effort for thorough analysis:
def review_code(code_snippet):
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=8192,
thinking={"type": "adaptive"},
messages=[{"role": "user", "content": f"Review this code:\n\n{code_snippet}"}],
extra_headers={"anthropic-effort": "max"}
)
return response.content[0].text
Scenario 3: Multi-Agent System
Use different effort levels for different agents in a multi-agent setup:
# Orchestrator agent: high effort for planning
orchestrator_effort = "high"
Research subagent: medium effort for balanced performance
research_effort = "medium"
Simple data extraction subagent: low effort for speed
extraction_effort = "low"
Effort vs. Budget Tokens
If you're migrating from budget_tokens on Opus 4.6 or Sonnet 4.6, here's what changed:
| Feature | budget_tokens (deprecated) | effort (recommended) |
|---|---|---|
| Control type | Hard token limit | Behavioral signal |
| Flexibility | Fixed budget per request | Adaptive to problem difficulty |
| Future support | Will be removed | Long-term supported |
| Works without thinking | No | Yes |
Best Practices
- Start with medium effort for most applications, then adjust based on observed performance and cost.
- Use adaptive thinking alongside effort for the best experience on complex tasks.
- Profile your workload: Run the same prompt at different effort levels to measure latency and quality differences.
- Combine with max_tokens: Set a reasonable
max_tokenslimit as a safety net even at max effort. - Monitor token usage: Track input and output tokens to calculate cost savings when lowering effort.
Limitations and Considerations
- Effort is not supported on all legacy models—check the model documentation for compatibility.
- At low effort, Claude may skip thinking for simple problems, which can reduce quality on borderline-complex tasks.
- The
xhighlevel is currently only available on Claude Opus 4.7. - Effort is a behavioral signal, so actual token savings may vary depending on problem difficulty.
Key Takeaways
- The effort parameter lets you control token spending by adjusting how eagerly Claude thinks before responding, from low (fast/cheap) to max (deep/thorough).
- Effort works with all tokens—text, tool calls, and extended thinking—giving you broad control over response behavior.
- Medium effort is the recommended default for most applications, especially with Sonnet 4.6, balancing speed, cost, and capability.
- Combine effort with adaptive thinking for optimal results on complex tasks requiring deep reasoning.
- Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6, offering more flexible, behavior-driven control without hard token limits.