Mastering Claude's Effort Parameter: Control Thinking Depth and Token Spend
Learn how to use Claude's effort parameter to control response thoroughness, reduce token costs, and optimize latency across all models including Opus 4.7 and Sonnet 4.6.
This guide explains how to use Claude's effort parameter to control how eagerly the model spends tokens, from max (deepest reasoning) to low (fastest, cheapest). You'll learn effort levels, recommended defaults for Sonnet 4.6, code examples, and how it replaces budget_tokens.
Introduction
Claude's effort parameter gives you fine-grained control over how thoroughly the model thinks before responding. By adjusting effort, you can trade off between response quality and token efficiency—all with a single model call. This feature is available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Opus 4.5.
Whether you're building a high-volume chat application that needs low latency, or a deep reasoning agent that requires maximum capability, the effort parameter lets you dial in the perfect balance.
How Effort Works
By default, Claude uses high effort—spending as many tokens as needed for excellent results. You can raise it to max for absolute highest capability, or lower it to medium or low to be more conservative with token usage.
The effort parameter affects all tokens in the response, including:
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
budget_tokens because:
- It doesn't require thinking to be enabled
- It can affect all token spend, including tool calls (lower effort means fewer tool calls)
Important: Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think on sufficiently difficult problems, but it will think less than it would at higher levels.
Effort Levels Explained
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability, no constraints on token spending | Deepest reasoning, most thorough analysis (Opus 4.7, Opus 4.6, Sonnet 4.6, Mythos Preview) |
xhigh | Extended capability for long-horizon work | Long-running agentic/coding tasks over 30 minutes (Opus 4.7 only) |
high | High capability (default) | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing balance of speed, cost, and performance |
low | Most efficient, significant token savings | Simple tasks, subagents, high-volume chat |
Recommended Defaults for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set effort:
- Medium (recommended default): Best balance for most applications—agentic coding, tool-heavy workflows, code generation.
- Low: For high-volume or latency-sensitive workloads—chat and non-coding use cases.
Code Examples
Python (with Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
Low effort - fast and cheap
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "What is the capital of France?"}],
extra_headers={"anthropic-effort": "low"}
)
Medium effort - balanced
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system="You are a coding assistant.",
messages=[{"role": "user", "content": "Write a Python function to sort a list of dictionaries by a key."}],
extra_headers={"anthropic-effort": "medium"}
)
Max effort - deepest reasoning
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
system="You are a research assistant.",
messages=[{"role": "user", "content": "Analyze the implications of quantum computing on cryptography."}],
extra_headers={"anthropic-effort": "max"}
)
TypeScript (with Anthropic SDK)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Low effort
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
system: 'You are a helpful assistant.',
messages: [{ role: 'user', content: 'What is the capital of France?' }],
extra_headers: { 'anthropic-effort': 'low' }
});
// Medium effort (recommended default for Sonnet 4.6)
const response2 = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 2048,
system: 'You are a coding assistant.',
messages: [{ role: 'user', content: 'Write a Python function to sort a list of dictionaries by a key.' }],
extra_headers: { 'anthropic-effort': 'medium' }
});
// Max effort
const response3 = await client.messages.create({
model: 'claude-opus-4-20250514',
max_tokens: 4096,
system: 'You are a research assistant.',
messages: [{ role: 'user', content: 'Analyze the implications of quantum computing on cryptography.' }],
extra_headers: { 'anthropic-effort': 'max' }
});
Using with Extended Thinking
Combine effort with adaptive thinking for the best experience:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
thinking={"type": "adaptive"},
system="You are a research assistant.",
messages=[{"role": "user", "content": "Solve this complex math problem step by step."}],
extra_headers={"anthropic-effort": "high"}
)
Effort vs. budget_tokens
For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.
| Aspect | effort | budget_tokens |
|---|---|---|
| Scope | Affects all tokens (text, tools, thinking) | Only affects thinking tokens |
| Precision | Behavioral signal (not strict) | Strict token budget |
| Simplicity | 5 levels (low/medium/high/xhigh/max) | Requires numeric value |
| Future-proof | ✅ Recommended | ❌ Deprecated |
Best Practices
- Start with medium for Sonnet 4.6 – Explicitly set effort to avoid unexpected latency.
- Use low for high-volume chat – When speed and cost matter most, and tasks are simple.
- Use max for complex reasoning – When you need Claude's deepest thinking (e.g., mathematical proofs, multi-step analysis).
- Combine with adaptive thinking – For the best balance of depth and efficiency.
- Test different levels – Run benchmarks with your specific use case to find the optimal effort level.
Limitations
- Not a strict budget: At lower effort levels, Claude may still think deeply on very difficult problems.
- Model availability:
maxis available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.xhighis only available on Opus 4.7. - No beta header required: The effort parameter works on all supported models without special headers.
Key Takeaways
- Effort controls token spend across all response types – text, tool calls, and extended thinking – giving you broad control over cost and latency.
- Five levels from low to max let you dial in the perfect balance for your use case, from simple chat to deep reasoning.
- Explicitly set effort for Sonnet 4.6 – the default is high, which may be more than you need. Medium is recommended for most applications.
- Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6 – migrate your code to use the new parameter before budget_tokens is removed.
- Combine with adaptive thinking for the best experience, allowing Claude to decide when to think deeply while respecting your effort preference.