Mastering Claude's Effort Parameter: Control Token Spend and Response Depth
Learn how to use Claude's effort parameter to balance response thoroughness and token efficiency across API calls, with practical code examples and recommended settings.
Claude's effort parameter lets you control how eagerly the model spends tokens on responses. Set it to 'low' for fast, cheap answers on simple tasks, 'medium' for balanced performance, 'high' for complex reasoning, or 'max' for the deepest possible analysis. It works across all response types including text, tool calls, and extended thinking.
Introduction
Claude is incredibly powerful, but with great power comes... greater token consumption. If you've ever wished you could dial Claude's thoroughness up or down depending on the task, the effort parameter is exactly what you need. Introduced in the Claude API, this parameter gives you fine-grained control over how many tokens Claude spends on each response—without switching models.
Whether you're building a high-volume chat application that needs lightning-fast replies, or an agentic system that requires deep reasoning over millions of tokens, the effort parameter lets you optimize for speed, cost, or capability—all with a single model.
In this guide, you'll learn:
- What the effort parameter is and how it works
- The five effort levels and when to use each
- How to combine effort with adaptive thinking
- Practical code examples in Python and TypeScript
- Best practices for different use cases
How the Effort Parameter Works
By default, Claude operates at high effort—spending as many tokens as needed to produce excellent results. The effort parameter lets you adjust this behavior:
- Raise effort to
maxfor the absolute highest capability on the hardest problems. - Lower effort to
mediumorlowto be more conservative with token usage, optimizing for speed and cost.
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
budget_tokens parameter (now deprecated on Opus 4.6 and Sonnet 4.6). Effort gives you a single dial to control overall token spend, including tool call frequency. At lower effort levels, Claude will make fewer tool calls and provide shorter explanations.
Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think deeply on sufficiently difficult problems—but it will think less than it would at higher levels for the same problem.
Effort Levels and Use Cases
| Level | Description | Typical Use Case |
|---|---|---|
max | Absolute maximum capability, no constraints on token spending | Deepest reasoning, most thorough analysis (Claude Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6) |
xhigh | Extended capability for long-horizon work | Long-running agentic and coding tasks over 30 minutes with token budgets in the millions (Opus 4.7 only) |
high | High capability (default behavior) | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing a balance of speed, cost, and performance |
low | Most efficient, significant token savings | Simple tasks, high-volume chat, subagents where speed and cost matter most |
Recommended Defaults for Sonnet 4.6
Sonnet 4.6 defaults to high effort. For most applications, explicitly set the effort level to avoid unexpected latency:
- Medium effort (recommended default): Best balance of speed, cost, and performance for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads like chat and non-coding use cases.
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This allows Claude to dynamically decide how much thinking to apply based on the problem complexity, while the effort parameter sets the overall behavioral context.
# Python example: effort + adaptive thinking
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
thinking={"type": "adaptive"},
effort="medium", # or "low", "high", "max"
messages=[
{"role": "user", "content": "Write a Python function to merge two sorted lists."}
]
)
print(response.content)
// TypeScript example: effort + adaptive thinking
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 8192,
thinking: { type: 'adaptive' },
effort: 'medium',
messages: [
{ role: 'user', content: 'Write a Python function to merge two sorted lists.' }
]
});
console.log(response.content);
Practical Examples
Example 1: Low Effort for Simple Chat
For a customer support chatbot handling common questions, low effort keeps responses fast and cheap:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="low",
messages=[
{"role": "user", "content": "What are your business hours?"}
]
)
Example 2: Medium Effort for Agentic Coding
For a coding assistant that needs to balance thoroughness with response time:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
effort="medium",
tools=[
{
"name": "edit_file",
"description": "Edit a file in the codebase",
"input_schema": {
"type": "object",
"properties": {
"file_path": {"type": "string"},
"content": {"type": "string"}
},
"required": ["file_path", "content"]
}
}
],
messages=[
{"role": "user", "content": "Add input validation to the user registration endpoint."}
]
)
Example 3: Max Effort for Deep Reasoning
For complex mathematical proofs or multi-step analysis:
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=16384,
thinking={"type": "adaptive"},
effort="max",
messages=[
{"role": "user", "content": "Prove that the square root of 2 is irrational."}
]
)
Best Practices
- Start with medium effort for most applications. It provides a strong balance of capability and efficiency.
- Use low effort for subagents in multi-agent systems where each subagent handles simple, well-defined tasks.
- Reserve max effort for the most challenging problems where you need Claude's absolute best reasoning.
- Combine with adaptive thinking to let Claude dynamically allocate thinking tokens based on problem difficulty.
- Monitor token usage across effort levels to find the sweet spot for your specific workload. Lower effort doesn't just reduce thinking tokens—it reduces all tokens, including tool calls.
Model Support
The effort parameter is available on:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
budget_tokens as the recommended way to control thinking depth.
Key Takeaways
- The effort parameter controls overall token spend across text, tool calls, and extended thinking—not just thinking tokens.
- Five levels are available:
low,medium,high(default),xhigh(Opus 4.7 only), andmax. - Combine with adaptive thinking (
thinking: {type: "adaptive"}) for the best experience on supported models. - Lower effort reduces tool call frequency, making it ideal for high-volume or latency-sensitive applications.
- Effort is a behavioral signal, not a strict budget—Claude will still think deeply on hard problems even at lower levels.