Mastering Claude’s Effort Parameter: Balance Speed, Cost, and Reasoning Depth
Learn how to use Claude's effort parameter to control token spend, response thoroughness, and latency across API calls. Includes code examples and best practices for Opus and Sonnet models.
This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on reasoning and responses. You'll learn the five effort levels (low, medium, high, xhigh, max), when to use each, and how to implement them in API calls with Python and TypeScript examples.
Mastering Claude’s Effort Parameter: Balance Speed, Cost, and Reasoning Depth
When building applications with Claude, you often face a trade-off: do you want the deepest possible reasoning, or do you need fast, cost-effective responses? Historically, you had to choose between different models or manually tweak token budgets. With the effort parameter, Anthropic gives you a single, elegant dial to control how eagerly Claude spends tokens—without switching models or enabling extended thinking.
This guide covers everything you need to know about the effort parameter: how it works, the five effort levels, practical code examples, and recommended settings for popular models like Claude Sonnet 4.6 and Claude Opus 4.7.
What Is the Effort Parameter?
The effort parameter is a top-level API field that tells Claude how much token budget to allocate when generating a response. It affects all tokens in the output—text, tool calls, function arguments, and extended thinking (if enabled).
Key benefits:
- No need to enable thinking to use effort (though combining them is recommended).
- Controls tool call frequency—lower effort means fewer tool calls, saving tokens.
- Works across all supported models without beta headers.
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
Note: For Claude Opus 4.6 and Sonnet 4.6, effort replaces the deprecatedbudget_tokensparameter. Whilebudget_tokensstill works, it will be removed in a future release.
How Effort Levels Work
Claude offers five effort levels, each designed for a specific use case:
| Level | Description | Typical Use Case |
|---|---|---|
low | Most efficient. Significant token savings with some capability reduction. | Simple tasks, high-volume chat, subagents, latency-sensitive workloads. |
medium | Balanced approach with moderate token savings. | Agentic tasks needing a trade-off between speed, cost, and performance. |
high | High capability. Equivalent to omitting the parameter. | Complex reasoning, difficult coding, general agentic tasks. |
xhigh | Extended capability for long-horizon work. Available only on Opus 4.7. | Long-running coding/agentic tasks (30+ minutes) with million-token budgets. |
max | Absolute maximum capability with no constraints on token spending. | Deepest possible reasoning, most thorough analysis. |
Recommended Settings for Sonnet 4.6
Claude Sonnet 4.6 defaults to high effort. If you don't set it explicitly, you may experience higher latency than expected. Anthropic recommends:
- Medium effort as your new default: Best balance of speed, cost, and performance for most applications. Ideal for agentic coding, tool-heavy workflows, and code generation.
- Low effort for high-volume or latency-sensitive workloads: Suitable for chat and non-coding use cases where faster turnaround is prioritized.
Combining Effort with Adaptive Thinking
For the best experience, pair effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking lets Claude decide dynamically how much thinking to do based on the problem complexity, while effort sets the overall token budget ceiling.
At high (default) and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems, saving even more tokens.
Code Examples
Python (using the Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
Low effort: fast, cheap responses for simple tasks
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="low",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.content[0].text)
Medium effort: balanced for agentic coding
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
effort="medium",
messages=[
{"role": "user", "content": "Write a Python function to merge two sorted lists."}
]
)
Max effort: deepest reasoning for complex analysis
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=8192,
effort="max",
messages=[
{"role": "user", "content": "Prove the Riemann Hypothesis (just kidding—explain the P vs NP problem in detail)."}
]
)
TypeScript (using the Anthropic SDK)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Low effort for quick answers
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
effort: 'low',
messages: [
{ role: 'user', content: 'Summarize this email in one sentence.' }
]
});
// Medium effort with adaptive thinking
const response2 = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
effort: 'medium',
thinking: { type: 'adaptive' },
messages: [
{ role: 'user', content: 'Debug this code and explain the fix.' }
]
});
// Max effort for Opus 4.7 with extended thinking
const response3 = await client.messages.create({
model: 'claude-opus-4-20250514',
max_tokens: 16384,
effort: 'max',
thinking: { type: 'enabled', budget_tokens: 8192 },
messages: [
{ role: 'user', content: 'Design a distributed caching system.' }
]
});
cURL Example
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 2048,
"effort": "medium",
"messages": [
{"role": "user", "content": "Explain quantum entanglement simply."}
]
}'
Practical Use Cases
1. Cost-Sensitive Chatbots
Set effort tolow for FAQ bots or simple Q&A. You'll save 30-50% on tokens compared to high, with minimal quality loss for straightforward questions.
2. Agentic Coding Assistants
Usemedium effort as your default. It provides enough reasoning for multi-step tool use and code generation without the latency of high. Reserve high or max only for the most complex debugging sessions.
3. Long-Running Research Agents
For agents that run 30+ minutes and consume millions of tokens, usexhigh (Opus 4.7 only). This gives Claude the headroom to maintain deep reasoning over extended interactions.
4. Subagents in a Multi-Agent System
Subagents handling narrow, well-defined tasks (e.g., data extraction, formatting) can safely uselow effort. Route only the main orchestrator agent with high or max.
Best Practices
- Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the
highdefault. - Combine with adaptive thinking for optimal token efficiency—let Claude decide when to think, while you control the overall budget.
- Start with
mediumfor most applications, then tune up or down based on observed quality and cost. - Monitor token usage in production. The effort parameter gives you predictable cost scaling—use it to set per-request budgets.
- Test with your actual prompts. Effort is behavioral; the same level may perform differently across tasks. Run A/B tests to find your sweet spot.
Key Takeaways
- The effort parameter is a single dial that controls token spend across text, tool calls, and thinking—no need to switch models or enable thinking.
- Five levels (low, medium, high, xhigh, max) let you trade off between speed/cost and reasoning depth.
- Medium effort is the recommended default for Sonnet 4.6, balancing performance and latency for most agentic and coding tasks.
- Combine with adaptive thinking for maximum efficiency—Claude decides when to think, while effort sets the ceiling.
- Effort replaces
budget_tokenson Opus 4.6 and Sonnet 4.6; migrate your code to avoid future breakage.