Mastering Claude’s Effort Parameter: Control Token Spend Without Sacrificing Intelligence
Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost. Includes code examples, recommended levels, and best practices for Opus 4.6, Sonnet 4.6, and more.
This guide explains Claude’s effort parameter, which lets you control how eagerly Claude spends tokens. You’ll learn how to set effort levels (low, medium, high, max), combine it with adaptive thinking, and see practical API examples to optimize speed and cost for your use case.
Introduction
Every Claude API call is a trade-off between thoroughness and efficiency. Do you want Claude to think deeply and produce the most complete answer possible? Or do you need a fast, low-cost response for a high-volume task? Historically, you had to choose between different models or fiddle with token budgets. Now, with the effort parameter, you can control this balance using a single model.
Effort is a behavioral signal that tells Claude how eager it should be about spending tokens. It works across all response types—text, tool calls, and extended thinking—and it’s available on Claude Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6, and Opus 4.5. This guide will show you exactly how to use it, when to choose each level, and how to combine it with adaptive thinking for the best results.
How the Effort Parameter Works
By default, Claude uses high effort, meaning it will spend as many tokens as needed to produce excellent results. You can lower the effort to save tokens and speed up responses, or raise it to max for the absolute highest capability.
Key points:
- Effort affects all tokens in the response: text, tool calls, and thinking (when enabled).
- It does not require extended thinking to be enabled.
- Lower effort means Claude may skip thinking for simple problems and make fewer tool calls.
- Setting
effort: "high"is identical to omitting the parameter entirely.
Note: For Opus 4.6 and Sonnet 4.6, effort replacesbudget_tokensas the recommended way to control thinking depth. Whilebudget_tokensis still accepted, it is deprecated and will be removed in a future model release.
Effort Levels and Use Cases
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability, no token constraints | Deepest reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6) |
xhigh | Extended capability for long-horizon work | Long-running agentic/coding tasks over 30 minutes (Opus 4.7 only) |
high | High capability, default behavior | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing speed, cost, and performance balance |
low | Most efficient, significant token savings | Simpler tasks, subagents, high-volume chat |
Recommended Effort Levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency, always set effort explicitly:
- Medium (recommended default): Best balance for most applications—agentic coding, tool-heavy workflows, code generation.
- Low: For high-volume or latency-sensitive workloads—chat, non-coding tasks where speed matters most.
- High: For tasks that need maximum quality from Sonnet 4.6.
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking by setting thinking: {type: "adaptive"}. This lets Claude dynamically decide when to think based on the problem difficulty and your effort level.
- At high and max effort, Claude will almost always think.
- At lower effort levels, Claude may skip thinking for simpler problems, saving tokens.
Practical API Examples
Python (using the Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
Low effort for fast, cheap responses
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="low",
messages=[
{"role": "user", "content": "Summarize this email in one sentence."}
]
)
print(response.content[0].text)
High effort for complex reasoning
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
effort="high",
messages=[
{"role": "user", "content": "Debug this Python code and explain the fix..."}
]
)
Max effort with adaptive thinking
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=8192,
effort="max",
thinking={"type": "adaptive"},
messages=[
{"role": "user", "content": "Prove the Riemann Hypothesis..."}
]
)
TypeScript (using the Anthropic SDK)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Medium effort for balanced agentic tasks
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 2048,
effort: 'medium',
messages: [
{ role: 'user', content: 'Write a function to fetch and parse JSON from an API.' }
]
});
console.log(response.content[0].text);
// Low effort for high-volume chat
const fastResponse = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 512,
effort: 'low',
messages: [
{ role: 'user', content: 'What is the capital of France?' }
]
});
REST API (raw curl)
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"effort": "medium",
"messages": [
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
}'
Best Practices
- Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the default
highsetting. - Start with
mediumfor most applications—it gives the best balance of speed, cost, and quality. - Use
lowfor subagents or high-volume pipelines where each call must be fast and cheap. - Reserve
maxfor the hardest problems—deep mathematical proofs, complex multi-step reasoning, or tasks where you need Claude’s absolute best. - Combine with adaptive thinking (
thinking: {type: "adaptive"}) to let Claude decide when to engage extended thinking, saving even more tokens on simple queries. - Monitor token usage—lower effort levels can significantly reduce your bill, especially for tool-heavy workflows where Claude makes fewer tool calls.
Limitations and Considerations
- Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think on sufficiently difficult problems, but it will think less than at higher levels.
- The
xhighlevel is currently only available on Claude Opus 4.7. - Effort affects all tokens, including tool calls. Lower effort means fewer tool calls, which may reduce the quality of multi-step agentic tasks.
- Zero Data Retention (ZDR) is supported—data sent with effort is not stored after the API response is returned.
Key Takeaways
- Effort lets you control token spend across text, tool calls, and thinking with a single parameter—no need to switch models.
- Five levels (
low,medium,high,xhigh,max) give you fine-grained control from fastest/cheapest to most thorough. - Combine with adaptive thinking for optimal efficiency—Claude decides when to think based on problem difficulty and your effort setting.
- Always set effort explicitly with Sonnet 4.6 to avoid defaulting to
highlatency. - Start with
mediumfor most use cases, and only go tomaxfor the hardest problems orlowfor high-volume, simple tasks.