Mastering Claude's Effort Parameter: Control Token Spend and Response Depth
Learn how to use Claude's effort parameter to control token usage, response thoroughness, and cost. Includes code examples, effort levels, and best practices for Sonnet 4.6 and Opus 4.6.
Claude's effort parameter lets you control how eagerly Claude spends tokens on responses, from max (deepest reasoning) to low (fastest, cheapest). It works with or without extended thinking and replaces budget_tokens on Opus 4.6 and Sonnet 4.6.
Mastering Claude's Effort Parameter: Control Token Spend and Response Depth
If you've ever wished you could dial Claude's thinking up or down depending on the task, your wish has been granted. The effort parameter gives you fine-grained control over how many tokens Claude spends on a response—without switching models. This guide explains everything you need to know to use effort effectively, with practical code examples and best practices.
What Is the Effort Parameter?
The effort parameter is a behavioral signal that tells Claude how eager it should be about spending tokens when responding. By default, Claude uses high effort, spending as many tokens as needed for excellent results. You can raise it to max for the absolute highest capability, or lower it to low for maximum speed and cost savings.
Key advantages of the effort parameter:
- No thinking required – Works with or without extended thinking enabled
- Affects all tokens – Controls text, tool calls, and thinking tokens
- Single model – No need to switch between different Claude models for different depth levels
Supported Models
The effort parameter is generally available on:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
budget_tokens parameter as the recommended way to control thinking depth.
Effort Levels Explained
| Level | Description | Best Use Case |
|---|---|---|
max | Absolute maximum capability, no token constraints | Deepest reasoning, complex research, multi-step analysis |
xhigh | Extended capability for long-horizon work (Opus 4.7 only) | Long-running agentic/coding tasks over 30 minutes |
high | High capability (default) | Complex reasoning, difficult coding, agentic tasks |
medium | Balanced approach with moderate token savings | Agentic tasks needing speed/cost balance |
low | Most efficient, significant token savings | Simple tasks, subagents, high-volume chat |
Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think on sufficiently difficult problems—just less than at higher levels.
How to Use the Effort Parameter
Basic Usage (Python SDK)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
effort="low", # Options: low, medium, high, max
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
)
print(response.content[0].text)
With Extended Thinking
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
thinking={"type": "adaptive"}, # Adaptive thinking pairs well with effort
effort="medium",
messages=[
{"role": "user", "content": "Design a distributed caching system."}
]
)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 8192,
effort: 'high',
messages: [
{ role: 'user', content: 'Write a Python script to analyze CSV data.' }
]
});
console.log(response.content[0].text);
Recommended Effort Levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort, which may introduce unexpected latency. Anthropic recommends explicitly setting effort:
- Medium (recommended default) – Best balance for most applications: agentic coding, tool-heavy workflows, code generation
- Low – For high-volume or latency-sensitive workloads: chat, non-coding use cases
- High – For tasks requiring maximum capability from Sonnet
Practical Scenarios
Scenario 1: Cost-Sensitive Subagents
If you're building a multi-agent system where subagents handle simple classification or extraction tasks, use low effort to minimize token spend:
def classify_document(text):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=500,
effort="low",
messages=[
{"role": "user", "content": f"Classify this document as 'urgent', 'normal', or 'low': {text}"}
]
)
return response.content[0].text
Scenario 2: Deep Research Tasks
For complex research or multi-step reasoning, use max effort with adaptive thinking:
def deep_research(query):
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=64000,
thinking={"type": "adaptive"},
effort="max",
messages=[
{"role": "user", "content": f"Conduct a thorough analysis of: {query}"}
]
)
return response.content[0].text
Scenario 3: Balanced Agentic Workflows
For a coding agent that needs both speed and quality, use medium effort:
def code_review_agent(code_snippet):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
effort="medium",
tools=[
{
"name": "suggest_improvements",
"description": "Suggest code improvements",
"input_schema": {
"type": "object",
"properties": {
"suggestions": {"type": "array", "items": {"type": "string"}}
}
}
}
],
messages=[
{"role": "user", "content": f"Review this code: {code_snippet}"}
]
)
return response
Effort vs. budget_tokens
If you're migrating from budget_tokens on Opus 4.6 or Sonnet 4.6, here's what changed:
| Feature | budget_tokens (deprecated) | effort (recommended) |
|---|---|---|
| Scope | Thinking tokens only | All tokens (text, tools, thinking) |
| Precision | Exact token budget | Behavioral signal |
| Flexibility | Requires thinking enabled | Works without thinking |
| Future-proof | Will be removed | Actively supported |
Best Practices
- Start with medium – For most applications,
mediumoffers the best balance of speed, cost, and quality - Combine with adaptive thinking –
thinking: {type: "adaptive"}pairs naturally with effort levels - Test with your workload – Run A/B tests to find the optimal effort level for your specific use case
- Use low for subagents – Simple classification, extraction, or routing tasks don't need high effort
- Reserve max for complex tasks – Only use
maxwhen you truly need the deepest reasoning
Common Pitfalls
- Expecting strict budgets – Effort is a signal, not a hard limit. Claude may still spend significant tokens on genuinely hard problems at
loweffort. - Ignoring Sonnet defaults – Sonnet 4.6 defaults to
higheffort. Always set it explicitly to avoid unexpected latency. - Using max unnecessarily –
maxeffort can dramatically increase token usage. Only use it when the task genuinely requires maximum capability.
Key Takeaways
- Effort controls token spend across text, thinking, and tool calls—not just thinking tokens like the deprecated
budget_tokens - Five levels available:
low,medium,high(default),xhigh(Opus 4.7 only), andmax - Works without extended thinking enabled, making it universally applicable
- Sonnet 4.6 users should explicitly set effort to avoid unexpected latency from the
highdefault - Combine with adaptive thinking for the best balance of depth and efficiency on complex tasks