Mastering Claude’s Effort Parameter: Control Token Spend and Response Depth
Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost across all supported models. Includes practical code examples and recommended settings.
This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn how to set effort levels (low, medium, high, xhigh, max) to trade off between thoroughness and efficiency, with practical API examples and recommended defaults for Sonnet 4.6.
Introduction
Claude is incredibly capable, but sometimes you don’t need the full firepower. Whether you’re building a high-volume chat application, a cost-sensitive subagent, or a deep reasoning system, controlling how much effort Claude puts into each response can save you tokens, reduce latency, and still deliver excellent results.
The effort parameter gives you that control. It’s a simple, single-model way to dial Claude’s token spending up or down—without switching models or sacrificing quality when you need it most.
In this guide, you’ll learn:
- What the effort parameter is and how it works
- When to use each effort level
- How to set effort in the API (with code examples)
- Recommended defaults for Claude Sonnet 4.6
- How effort compares to the deprecated
budget_tokensparameter
What Is the Effort Parameter?
The effort parameter is a behavioral signal that tells Claude how eagerly it should spend tokens when responding to requests. It affects all tokens in the response—including text explanations, tool calls, and extended thinking (when enabled).
Key points:
- Available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Opus 4.5.
- No beta header required.
- Replaces
budget_tokensas the recommended way to control thinking depth (for Opus 4.6 and Sonnet 4.6). - Works with or without extended thinking enabled.
Effort Levels and Their Use Cases
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability, no token constraints | Deepest reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6) |
xhigh | Extended capability for long-horizon work | Long-running agentic/coding tasks (>30 min, token budgets in millions) – Opus 4.7 only |
high | High capability (default) | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach, moderate token savings | Agentic tasks needing speed, cost, and performance balance |
low | Most efficient, significant token savings | Simpler tasks, high-volume subagents, latency-sensitive workloads |
Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think on sufficiently difficult problems—just less than at higher levels.
How Effort Works Under the Hood
When you set effort to high (or omit the parameter), Claude behaves exactly as it does today—spending as many tokens as needed for excellent results.
- At
maxeffort, Claude will almost always engage extended thinking, even for simple requests. - At
loweffort, Claude may skip thinking for simpler problems, producing shorter, faster responses. - Effort affects tool calls too: lower effort means Claude makes fewer tool calls, saving even more tokens.
budget_tokens approach, which only limited thinking tokens.
Recommended Effort Levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency and token spend, explicitly set effort when using Sonnet 4.6:
medium(recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.low: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed matters most.
Using Effort in the API
Here’s how to set the effort parameter in both Python and TypeScript.
Python Example
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "Explain quantum entanglement in simple terms."}
],
# Set effort level
effort="medium"
)
print(response.content[0].text)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 8192,
system: 'You are a helpful assistant.',
messages: [
{ role: 'user', content: 'Explain quantum entanglement in simple terms.' }
],
// Set effort level
effort: 'medium'
});
console.log(response.content[0].text);
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
thinking={"type": "adaptive"},
effort="high",
messages=[
{"role": "user", "content": "Solve this complex math problem step by step."}
]
)
Adaptive thinking lets Claude decide when to use extended thinking, while effort controls how much token budget to allocate overall.
Effort vs. budget_tokens (Deprecated)
For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.
| Aspect | effort | budget_tokens (deprecated) |
|---|---|---|
| Scope | Affects all tokens (text, tools, thinking) | Only affects thinking tokens |
| Granularity | Behavioral levels (low/medium/high/max) | Exact token budget |
| Simplicity | Easy to tune | Requires experimentation |
| Future-proof | Yes | Will be removed |
Best Practices
- Start with
mediumfor Sonnet 4.6 – It’s the sweet spot for most applications. - Use
lowfor high-volume subagents – When you have many parallel agents doing simple tasks,lowsaves tokens and reduces latency. - Reserve
maxfor critical deep reasoning – Use it only when you need the absolute best answer, like complex analysis or debugging. - Combine with adaptive thinking – For models that support it, adaptive thinking + effort gives you the best of both worlds.
- Monitor token usage – Effort is a signal, not a hard limit. Always monitor actual token spend in production.
Key Takeaways
- The effort parameter lets you control Claude’s token spending across all response types (text, tools, thinking).
- Effort levels range from
low(fastest, cheapest) tomax(most thorough), withhighas the default. - For Sonnet 4.6, explicitly set effort to
mediumas a recommended default to balance speed, cost, and performance. - Effort replaces
budget_tokensfor Opus 4.6 and Sonnet 4.6, and works without extended thinking enabled. - Combine effort with adaptive thinking for the best results on supported models.