Mastering Claude’s Effort Parameter: Control Token Spend Without Sacrificing Intelligence
Learn how to use Claude's effort parameter to balance response thoroughness and token efficiency across all supported models, with practical API examples and recommended settings.
This guide explains Claude’s effort parameter, which lets you control how eagerly Claude spends tokens on responses. You’ll learn the five effort levels (low, medium, high, xhigh, max), how to combine effort with adaptive thinking, and get recommended defaults for Sonnet 4.6 and Opus 4.7 to optimize speed, cost, and capability.
Introduction
If you’ve ever wished you could dial Claude’s thoroughness up or down—spending fewer tokens on simple tasks while reserving maximum reasoning for the hardest problems—the effort parameter is exactly what you need. Introduced across Claude’s latest models, effort gives you fine-grained control over how many tokens Claude invests in each response, without requiring extended thinking mode to be enabled.
This guide covers everything you need to know to start using effort effectively: how it works, the five effort levels, recommended defaults for different models, and practical API examples in Python and TypeScript.
What Is the Effort Parameter?
The effort parameter is a behavioral signal that tells Claude how eager it should be about spending tokens when responding to requests. It affects all tokens in the response—including text explanations, tool calls, function arguments, and extended thinking (when enabled).
Key advantages:
- No need to enable extended thinking – effort works independently.
- Controls tool call volume – lower effort means fewer tool calls, giving you greater efficiency gains.
- Single model, multiple behaviors – you can switch between low, medium, high, and max effort without changing models.
Important: For Claude Opus 4.6 and Sonnet 4.6, effort replaces the deprecatedbudget_tokensparameter. Whilebudget_tokensis still accepted, it will be removed in a future release.
How Effort Levels Work
There are five effort levels, each suited to different use cases:
| Level | Description | Typical Use Case |
|---|---|---|
low | Most efficient. Significant token savings with some capability reduction. | Simple tasks needing best speed and lowest cost (e.g., subagents, chat). |
medium | Balanced approach with moderate token savings. | Agentic tasks requiring a balance of speed, cost, and performance. |
high | High capability. Equivalent to omitting the parameter. | Complex reasoning, difficult coding problems, agentic tasks. |
xhigh | Extended capability for long-horizon work. | Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions. |
max | Absolute maximum capability with no constraints on token spending. | Tasks requiring the deepest possible reasoning and most thorough analysis. |
xhigh is available only on Claude Opus 4.7. max is available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6.
Recommended Defaults for Sonnet 4.6
Sonnet 4.6 defaults to high effort. To avoid unexpected latency and costs, Anthropic recommends explicitly setting effort:
- Medium (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed is critical.
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking lets Claude decide dynamically how much thinking to apply based on the problem, while effort sets the overall budget envelope.
At high (default) and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems, saving tokens without sacrificing quality on easy tasks.
Practical API Examples
Python (using the Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
Low effort – fast, cheap responses for simple tasks
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
effort="low",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.content[0].text)
High effort – thorough reasoning for complex problems
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
effort="high",
messages=[
{"role": "user", "content": "Explain the implications of quantum entanglement on modern cryptography."}
]
)
print(response.content[0].text)
Max effort – absolute maximum capability
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=8192,
effort="max",
messages=[
{"role": "user", "content": "Design a novel algorithm for distributed consensus that tolerates Byzantine faults with minimal latency."}
]
)
print(response.content[0].text)
TypeScript (using the Anthropic SDK)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
// Medium effort – balanced for agentic workflows
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 2048,
effort: "medium",
messages: [
{ role: "user", content: "Write a Python script to scrape a website and extract all image URLs." }
]
});
console.log(response.content[0].text);
// Combining effort with adaptive thinking
const thinkingResponse = await client.messages.create({
model: "claude-opus-4-20250514",
max_tokens: 4096,
effort: "high",
thinking: { type: "adaptive" },
messages: [
{ role: "user", content: "Solve this complex math problem step by step: ∫(x^2 * e^x) dx" }
]
});
console.log(thinkingResponse.content[0].text);
Using effort with tool calls
Effort also controls how many tool calls Claude makes. Lower effort means fewer tool calls, which can significantly reduce latency and cost in agentic workflows.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
effort="low", # Fewer tool calls, faster responses
tools=[
{
"name": "search_web",
"description": "Search the web for information",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
],
messages=[
{"role": "user", "content": "Find the latest news about AI regulation."}
]
)
print(response.content)
Best Practices
- Start with medium for Sonnet 4.6 – It provides the best balance for most applications. Only increase to
highormaxwhen you need deeper reasoning. - Use
lowfor high-volume subagents – If you’re running many parallel agents (e.g., for data extraction or classification), low effort saves tokens without significant quality loss. - Combine with adaptive thinking – For models that support it,
thinking: {type: "adaptive"}lets Claude decide when to think, while effort sets the overall budget. - Test with your specific workload – Effort is a behavioral signal, not a strict budget. Run A/B tests to find the optimal level for your use case.
- Monitor token usage – Use the API’s usage statistics to compare token spend across effort levels and adjust accordingly.
Limitations and Considerations
- Not a strict token budget – Claude may still think deeply on hard problems even at
loweffort. The parameter is a signal, not a hard cap. - Model availability –
xhighis only available on Claude Opus 4.7.maxis available on Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6. - Deprecation of budget_tokens – If you’re using
budget_tokenson Opus 4.6 or Sonnet 4.6, migrate toeffortas soon as possible.
Key Takeaways
- The effort parameter lets you control token spend across all response types – including text, tool calls, and extended thinking – without needing thinking mode enabled.
- Five levels (low, medium, high, xhigh, max) give you fine-grained control, with
mediumrecommended as the default for Sonnet 4.6. - Combine effort with adaptive thinking for the best balance of capability and efficiency.
- Lower effort reduces tool call volume, making it ideal for high-throughput agentic workflows.
- Always test with your specific workload to find the optimal effort level for your application’s speed, cost, and quality requirements.