GuideBeginnerPricing2026-05-22

Mastering Claude's Effort Parameter: Control Token Spend and Response Depth

Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost. Includes code examples, effort levels, and best practices for Opus and Sonnet models.

Quick Answer

Claude's effort parameter lets you control how eagerly the model spends tokens on a request, from 'low' (fast, cheap) to 'max' (deepest reasoning). It works across all response tokens, including tool calls and thinking, without requiring extended thinking mode.

effort parametertoken efficiencyClaude APIextended thinkingcost optimization

Mastering Claude's Effort Parameter: Control Token Spend and Response Depth

If you've ever wished you could dial Claude's "thinking effort" up or down depending on the task, you're in luck. The effort parameter gives you exactly that control—letting you trade off between response thoroughness and token efficiency, all with a single model. Whether you're building a high-volume chatbot or a deep-reasoning coding agent, understanding effort is key to getting the most out of Claude.

In this guide, we'll cover:

What the effort parameter is and how it works
Each effort level and when to use it
Code examples for Python and TypeScript
Best practices for combining effort with adaptive thinking
Recommendations for Claude Sonnet 4.6 and Opus 4.6

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how eagerly it should spend tokens when responding. It affects all tokens in the response—including text, tool calls, and extended thinking—giving you granular control over cost and latency.

Key points:

Available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Opus 4.5.
No beta header required.
For Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth.
Combine with adaptive thinking (thinking: {type: "adaptive"}) for the best experience.

Note: At high (default) and max effort, Claude will almost always think. At lower levels, it may skip thinking for simpler problems.

Effort Levels Explained

Level	Description	Best For
`max`	Absolute maximum capability, no constraints on token spend. Available on Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.	Deepest reasoning, most thorough analysis
`xhigh`	Extended capability for long-horizon work. Available on Opus 4.7 only.	Long-running agentic/coding tasks (>30 min) with token budgets in the millions
`high`	High capability. Equivalent to not setting the parameter.	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced approach with moderate token savings.	Agentic tasks needing a balance of speed, cost, and performance
`low`	Most efficient. Significant token savings with some capability reduction.	Simple tasks, subagents, high-volume chat

Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think on sufficiently difficult problems—but it will think less than it would at higher levels for the same problem.

How Effort Affects Your Responses

Setting effort to "high" produces exactly the same behavior as omitting the parameter entirely. The parameter influences:

Text responses and explanations – more thorough at high/max, more concise at low/medium
Tool calls and function arguments – lower effort means fewer tool calls, saving tokens
Extended thinking – when enabled, thinking depth scales with effort level

This approach has two major advantages:

It doesn't require thinking to be enabled – you can use effort even without extended thinking mode.
It affects all token spend – including tool calls, giving much greater control over efficiency.

Code Examples

Python (using the Anthropic SDK)

import anthropic
client = anthropic.Anthropic()
Low effort for fast, cheap responses
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    effort="low"  # Fast, minimal token spend
)
print(response.content[0].text)

# Max effort for deep reasoning
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    system="You are a research assistant.",
    messages=[{"role": "user", "content": "Analyze the implications of quantum computing on cryptography."}],
    effort="max"  # Deepest reasoning, highest token spend
)
print(response.content[0].text)

TypeScript (using the Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Medium effort for balanced performance
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  system: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'Write a Python function to sort a list of dictionaries by a key.' }],
  effort: 'medium'
});
console.log(response.content[0].text);

// Combining effort with adaptive thinking
const response = await client.messages.create({
  model: 'claude-opus-4-20250514',
  max_tokens: 8192,
  thinking: { type: 'adaptive' },
  system: 'You are a math tutor.',
  messages: [{ role: 'user', content: 'Prove the Pythagorean theorem.' }],
  effort: 'high'
});
console.log(response.content[0].text);

Recommended Effort Levels for Sonnet 4.6

Claude Sonnet 4.6 defaults to high effort. To avoid unexpected latency and cost, explicitly set the effort level when using this model:

Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed matters more than depth.

Best Practices

1. Match Effort to Task Complexity

Use low for simple Q&A, greetings, or subagents that don't need deep reasoning.
Use medium for most agentic tasks and coding workflows.
Use high for complex reasoning, debugging, or tasks requiring thorough analysis.
Use max only when you need the absolute best possible answer and cost is not a concern.

2. Combine with Adaptive Thinking

For models that support it (Opus 4.6, Sonnet 4.6, and later), pair effort with thinking: {type: "adaptive"}. This lets Claude dynamically decide how much to think based on the problem, while effort sets the overall ceiling.

3. Test with Your Workload

Effort is a behavioral signal, so results can vary by task. Run A/B tests with different effort levels on representative samples to find the sweet spot for your use case.

4. Monitor Token Usage

Lower effort levels can significantly reduce token spend, especially in tool-heavy workflows. Track your token usage per request to quantify savings.

Common Pitfalls to Avoid

Assuming low effort always skips thinking: Claude will still think on hard problems, just less. For truly simple tasks, it may skip thinking entirely.
Using max effort for every request: This can dramatically increase costs and latency. Reserve max for tasks that genuinely need it.
Forgetting to set effort on Sonnet 4.6: The default is high, which may be overkill for simple chat applications. Always set it explicitly.

Conclusion

The effort parameter is a powerful tool for fine-tuning Claude's behavior to match your application's needs. By choosing the right effort level, you can optimize for speed, cost, or depth—without switching models or enabling extended thinking.

Start experimenting with different levels today, and you'll quickly find the right balance for your use case.

Key Takeaways

Effort controls token spend across all response types – text, tool calls, and thinking – without requiring extended thinking mode.
Five levels are available: low, medium, high, xhigh (Opus 4.7 only), and max.
Explicitly set effort on Sonnet 4.6 to avoid unexpected latency; medium is recommended as a default.
Combine with adaptive thinking for the best balance of depth and efficiency on supported models.
Effort is a behavioral signal, not a strict budget – Claude will still think deeply on hard problems even at lower levels.