Mastering Claude's Effort Parameter: Balance Performance and Cost
Learn how to control Claude's token spending with the effort parameter. Optimize for speed, cost, or deep reasoning across all API responses, including tool calls and extended thinking.
This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn how to set effort levels (low, medium, high, xhigh, max) to trade off between response thoroughness and efficiency, with practical code examples for the API.
Introduction
Claude is incredibly powerful, but with great power comes great token consumption. Every response, every tool call, every chain of thought costs tokens — and that translates to latency and expense. Enter the effort parameter: a single API setting that gives you fine-grained control over how "eager" Claude is about spending tokens.
Introduced for Claude Opus 4.6 and Sonnet 4.6, and now available across the latest models including Claude Mythos Preview and Claude Opus 4.7, effort replaces the older budget_tokens parameter as the recommended way to control thinking depth. It works with or without extended thinking enabled, and it affects all tokens — text, tool calls, and thinking blocks.
In this guide, you'll learn:
- How the effort parameter works under the hood
- When to use each effort level
- How to implement effort in your API calls (Python and TypeScript)
- Best practices for combining effort with adaptive thinking
How Effort Works
By default, Claude operates at high effort — spending as many tokens as needed for excellent results. The effort parameter lets you dial this up or down:
- Higher effort → More tokens spent → Deeper reasoning, better quality, higher cost and latency
- Lower effort → Fewer tokens spent → Faster responses, lower cost, some capability reduction
What Effort Affects
Unlike some controls that only affect thinking tokens, effort influences all tokens in the response:
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
Effort Levels and When to Use Them
| Level | Description | Best For |
|---|---|---|
low | Most efficient. Significant token savings with some capability reduction. | Simple tasks, subagents, high-volume chat, latency-sensitive workloads |
medium | Balanced approach with moderate token savings. | Agentic tasks needing speed/cost balance, tool-heavy workflows, code generation |
high | High capability. Equivalent to omitting the parameter. | Complex reasoning, difficult coding, agentic tasks |
xhigh | Extended capability for long-horizon work. (Opus 4.7 only) | Long-running agentic/coding tasks (30+ min) with million-token budgets |
max | Absolute maximum capability with no constraints. (Mythos, Opus 4.6+, Sonnet 4.6) | Deepest possible reasoning, most thorough analysis |
Note: Setting effort to "high" produces exactly the same behavior as omitting the parameter entirely.
Recommended Defaults for Sonnet 4.6
Sonnet 4.6 defaults to high effort. Anthropic recommends explicitly setting effort to avoid unexpected latency:
- Medium effort — Best balance for most applications (agentic coding, tool-heavy workflows, code generation)
- Low effort — For high-volume or latency-sensitive workloads (chat, non-coding use cases)
Combining Effort with Adaptive Thinking
For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This lets Claude dynamically decide when to think based on the problem complexity, while effort controls the overall token budget.
At high (default) and max effort, Claude will almost always think. At lower levels, it may skip thinking for simpler problems — saving tokens without sacrificing quality on hard tasks.
Code Examples
Python (using the Anthropic SDK)
import anthropic
client = anthropic.Anthropic()
Low effort — fast and cheap for simple tasks
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
# Highlight: effort parameter
effort="low"
)
print(response.content[0].text)
# Medium effort — balanced for agentic workflows
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a coding assistant.",
messages=[
{"role": "user", "content": "Write a Python function to merge two sorted lists."}
],
effort="medium"
)
# Max effort — for the hardest problems
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=8192,
system="You are a research mathematician.",
messages=[
{"role": "user", "content": "Prove the Riemann Hypothesis."}
],
effort="max",
thinking={"type": "adaptive"}
)
TypeScript (using the Anthropic SDK)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Low effort for simple queries
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
system: 'You are a helpful assistant.',
messages: [
{ role: 'user', content: 'What is 2 + 2?' }
],
effort: 'low'
});
console.log(response.content[0].text);
// Medium effort with adaptive thinking
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
system: 'You are a coding assistant.',
messages: [
{ role: 'user', content: 'Refactor this code for performance.' }
],
effort: 'medium',
thinking: { type: 'adaptive' }
});
Best Practices
- Start with medium effort for most applications. It provides the best balance of speed, cost, and quality.
- Use low effort for subagents or simple classification tasks where speed matters more than depth.
- Reserve max effort for the hardest problems — complex math, deep research, or critical decision-making.
- Combine with adaptive thinking to let Claude decide when to think, saving tokens on simple queries.
- Monitor token usage across effort levels to find the sweet spot for your use case.
Migration from budget_tokens
If you're using budget_tokens with Opus 4.6 or Sonnet 4.6, switch to effort now. The budget_tokens parameter is deprecated and will be removed in a future model release.
thinking={"type": "enabled", "budget_tokens": 16000}
After:
effort="high",
thinking={"type": "adaptive"}
Key Takeaways
- The effort parameter controls token spending across all response types (text, tool calls, thinking) — not just thinking tokens.
- Five levels are available:
low,medium,high,xhigh(Opus 4.7), andmax(selected models). - Medium effort is the recommended default for Sonnet 4.6, balancing speed, cost, and performance.
- Combine with adaptive thinking (
thinking: {type: "adaptive"}) for the best experience — Claude thinks only when needed. - Effort replaces
budget_tokenson Opus 4.6 and Sonnet 4.6 — migrate your code to avoid future breakage.