Mastering the Effort Parameter in Claude API: Balance Cost, Speed, and Intelligence
Learn how to use Claude's effort parameter to control token spending, response thoroughness, and latency. Includes code examples, effort levels, and best practices for Sonnet 4.6 and Opus 4.7.
The effort parameter lets you control how eagerly Claude spends tokens on a response. Set it from 'low' (fast, cheap, simpler tasks) to 'max' (deepest reasoning). It works across all response tokens—including tool calls and thinking—without requiring extended thinking mode.
Introduction
Every Claude API call is a trade-off between intelligence, speed, and cost. Sometimes you need Claude to reason deeply about a complex codebase; other times you just need a quick classification. Historically, developers had to juggle separate models, thinking budgets, and complex prompt engineering to achieve this balance.
Enter the effort parameter—a single, intuitive control that lets you dial Claude's "eagerness to spend tokens" up or down. Available on Claude Opus 4.5, Opus 4.6, Opus 4.7, Sonnet 4.6, and the new Mythos Preview, effort replaces the older budget_tokens approach and works seamlessly with or without extended thinking.
In this guide, you'll learn:
- What the effort parameter does and how it differs from token budgets
- The six effort levels and when to use each
- Practical code examples in Python and TypeScript
- Best practices for Sonnet 4.6 and Opus 4.7
- How to combine effort with adaptive thinking for maximum efficiency
How the Effort Parameter Works
By default, Claude operates at high effort—spending as many tokens as needed for excellent results. The effort parameter lets you move up or down from this baseline:
- Raise effort → deeper reasoning, more tool calls, longer responses, higher cost
- Lower effort → faster responses, fewer tokens, lower cost, some capability reduction
Important: Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think hard on sufficiently difficult problems—it just won't think as much as it would at higher effort for the same problem.
Effort Levels and When to Use Them
| Level | Description | Best For |
|---|---|---|
max | Absolute maximum capability, no token constraints | Deepest reasoning, research, complex math (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6) |
xhigh | Extended capability for long-horizon work | Long-running agentic/coding tasks (>30 min, millions of tokens) — Opus 4.7 only |
high | Default behavior, excellent results | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate savings | Agentic tasks needing speed/cost/performance balance |
low | Most efficient, significant token savings | Simple tasks, subagents, high-volume chat, latency-sensitive workloads |
"high" produces exactly the same behavior as omitting the parameter entirely.
Code Examples
Python (using the Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
Low effort: fast, cheap classification
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="Classify the sentiment as positive, negative, or neutral.",
messages=[
{"role": "user", "content": "The product arrived broken and customer service was unhelpful."}
],
thinking={"type": "enabled", "budget_tokens": 1024},
effort="low" # Fast, minimal thinking
)
print(response.content[0].text)
# Max effort: deep reasoning for complex code review
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=8192,
system="You are a senior code reviewer. Find all bugs, security issues, and performance problems.",
messages=[
{"role": "user", "content": "Review this Python code..."}
],
thinking={"type": "enabled", "budget_tokens": 16000},
effort="max"
)
print(response.content[0].text)
TypeScript (using the Anthropic SDK)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Medium effort: balanced for agentic coding
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
system: 'You are a helpful coding assistant. Generate clean, well-documented code.',
messages: [
{ role: 'user', content: 'Write a React component that fetches and displays user data.' }
],
thinking: { type: 'enabled', budget_tokens: 4096 },
effort: 'medium'
});
console.log(response.content[0].text);
Combining Effort with Adaptive Thinking
For the best experience on Opus 4.6 and Sonnet 4.6, Anthropic recommends combining effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking lets Claude dynamically decide how much thinking to do based on the problem complexity, while effort sets the overall eagerness level.
# Adaptive thinking + medium effort = optimal balance
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{"role": "user", "content": "Explain quantum entanglement in simple terms."}],
thinking={"type": "adaptive"},
effort="medium"
)
When using adaptive thinking, Claude may skip thinking entirely for simple problems at lower effort levels—saving you tokens and latency.
Best Practices for Sonnet 4.6
Sonnet 4.6 defaults to high effort, which can introduce unexpected latency. Anthropic recommends explicitly setting effort when using Sonnet 4.6:
- Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed is critical.
# Recommended: explicitly set effort to avoid surprises
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{"role": "user", "content": "Write a bash script to backup a PostgreSQL database."}],
effort="medium" # Explicitly set, not relying on default
)
Effort and Tool Calls
One of the biggest advantages of the effort parameter is that it affects tool call behavior. At lower effort levels, Claude will make fewer tool calls and choose simpler tool combinations. This can dramatically reduce both latency and cost in agentic workflows.
# Low effort: Claude will be conservative with tool usage
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[
{
"name": "search_database",
"description": "Search the product database",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
},
{
"name": "get_product_details",
"description": "Get full product details",
"input_schema": {
"type": "object",
"properties": {
"product_id": {"type": "string"}
},
"required": ["product_id"]
}
}
],
messages=[{"role": "user", "content": "Find me the best laptop under $1000."}],
effort="low"
)
Migration from budget_tokens
If you're currently using budget_tokens on Opus 4.6 or Sonnet 4.6, Anthropic recommends migrating to the effort parameter. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.
thinking={"type": "enabled", "budget_tokens": 2048}
After (recommended):
thinking={"type": "adaptive"},
effort="medium"
Key Takeaways
- Effort is a single, unified control that affects all response tokens—text, tool calls, and thinking—giving you fine-grained control over the cost-speed-intelligence trade-off.
- Six levels from
low(fastest, cheapest) tomax(deepest reasoning) let you match Claude's behavior to your task complexity. - Combine with adaptive thinking (
thinking: {type: "adaptive"}) for optimal efficiency on Opus 4.6 and Sonnet 4.6. - Always explicitly set effort on Sonnet 4.6 to avoid unexpected latency from the default
highsetting. - Migrate from
budget_tokensto effort + adaptive thinking for future-proof code that works across all supported models.