Mastering Claude’s Effort Parameter: Control Thinking Depth for Speed & Cost
Learn how to use Claude's effort parameter to balance response thoroughness, token efficiency, and latency. Includes code examples, effort levels, and best practices for Opus 4.6, Sonnet 4.6, and Mythos Preview.
The effort parameter lets you control how eagerly Claude spends tokens on a response. Set it to 'low' for fast, cheap answers on simple tasks, or 'max' for the deepest reasoning on complex problems. It works across all response tokens, including tool calls and thinking.
Introduction
Claude’s effort parameter is a powerful new way to fine-tune the trade-off between response quality and token efficiency. Instead of switching between different models or manually setting token budgets, you can now tell a single Claude model how hard it should think — from a quick, cost-effective answer to a deep, multi-step reasoning marathon.
This guide explains how effort works, when to use each level, and how to integrate it into your API calls. By the end, you’ll know exactly how to dial in the right balance for your use case.
How Effort Works
The effort parameter is a behavioral signal, not a strict token budget. When you set effort to high (the default), Claude spends as many tokens as needed for excellent results. Lower levels tell Claude to be more conservative — it may skip thinking for simple problems, but it will still think deeply when the problem truly requires it.
Effort affects all tokens in the response:
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
budget_tokens, which only controlled thinking tokens. With effort, you get a unified control that also reduces the number of tool calls Claude makes at lower levels.
Note: For Claude Opus 4.6 and Sonnet 4.6, effort replacesbudget_tokensas the recommended way to control thinking depth.budget_tokensis deprecated and will be removed in a future model release.
Effort Levels
| Level | Description | Best For |
|---|---|---|
max | Absolute maximum capability, no constraints on token spending | Deepest reasoning, most thorough analysis (Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6) |
xhigh | Extended capability for long-horizon work | Long-running agentic and coding tasks (>30 min) with token budgets in the millions (Opus 4.7 only) |
high | High capability (default) | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing a balance of speed, cost, and performance |
low | Most efficient, significant token savings | Simple tasks, subagents, high-volume or latency-sensitive workloads |
Recommended Default for Sonnet 4.6
Sonnet 4.6 defaults to high effort. If you don’t set it explicitly, you may get unexpected latency. The recommended defaults are:
- Medium effort — Best balance of speed, cost, and performance for most applications (agentic coding, tool-heavy workflows, code generation).
- Low effort — For high-volume or latency-sensitive workloads (chat, non-coding use cases).
Using Effort in the API
Effort is available on all supported models with no beta header required. You can set it directly in the request body.
Python Example
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="medium", # or "low", "high", "max"
messages=[
{"role": "user", "content": "Write a Python function to merge two sorted lists."}
]
)
print(response.content[0].text)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
effort: 'medium',
messages: [
{ role: 'user', content: 'Write a Python function to merge two sorted lists.' }
]
});
console.log(response.content[0].text);
Combining with Adaptive Thinking
For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This lets Claude dynamically decide how much thinking to use based on the problem complexity, while still respecting your effort signal.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
thinking={"type": "adaptive"},
effort="high",
messages=[
{"role": "user", "content": "Explain the implications of quantum computing on cryptography."}
]
)
When to Use Each Effort Level
Low Effort
- Use for: Simple Q&A, classification, summarization, chat, subagents that don’t need deep reasoning.
- Benefits: Fastest response times, lowest token cost.
- Trade-off: Reduced capability on complex problems. Claude may skip thinking entirely for easy tasks.
Medium Effort
- Use for: Agentic coding, tool-heavy workflows, code generation, multi-step tasks that need some reasoning but not maximum depth.
- Benefits: Good balance of speed, cost, and performance. Recommended default for Sonnet 4.6.
- Trade-off: Slightly less thorough than
highon very difficult problems.
High Effort (Default)
- Use for: Complex reasoning, difficult coding problems, agentic tasks where quality is paramount.
- Benefits: Excellent results, Claude spends as many tokens as needed.
- Trade-off: Higher latency and cost compared to lower levels.
Max Effort
- Use for: The absolute hardest problems — mathematical proofs, deep scientific analysis, multi-hour agentic tasks.
- Benefits: No constraints on token spending, maximum capability.
- Trade-off: Highest cost and latency. Available only on Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.
XHigh Effort (Opus 4.7 only)
- Use for: Long-running agentic and coding tasks lasting over 30 minutes, with token budgets in the millions.
- Benefits: Extended capability for sustained, complex workflows.
- Trade-off: Very high token consumption.
Practical Tips
- Start with
mediumfor Sonnet 4.6 — It’s the best default for most applications. Explicitly set effort to avoid unexpected latency. - Use
lowfor high-volume pipelines — If you’re processing thousands of simple requests (e.g., classification, extraction),loweffort can dramatically reduce costs. - Reserve
maxfor the hardest 5% of problems — Don’t use it for everyday tasks. The token cost can be 10x or more compared tomedium. - Combine with adaptive thinking — This gives Claude the flexibility to think only when necessary, while still respecting your effort level.
- Monitor token usage — Effort is a behavioral signal, not a hard budget. Always track your actual token spend and adjust accordingly.
Key Takeaways
- Effort replaces
budget_tokensfor Opus 4.6 and Sonnet 4.6. It controls all response tokens, not just thinking. - Five levels —
low,medium,high(default),xhigh(Opus 4.7 only), andmax— let you dial in the perfect balance of speed, cost, and capability. - No beta header required — Effort works on all supported models out of the box.
- Combine with adaptive thinking for the best experience, especially on mixed workloads.
- Explicitly set effort for Sonnet 4.6 to avoid unexpected latency;
mediumis the recommended default for most applications.