Guide2026-05-06

Mastering Claude's Effort Parameter: Control Token Spend and Response Depth

Learn how to use Claude's effort parameter to balance response thoroughness with token efficiency. Includes practical examples, effort levels, and best practices.

Quick Answer

Claude's effort parameter lets you control how many tokens the model spends on responses, from low (fast/cheap) to max (deepest reasoning). This guide shows you how to set it, when to use each level, and how it compares to budget_tokens.

effort parametertoken optimizationClaude APIcost controlresponse depth

Mastering Claude's Effort Parameter: Control Token Spend and Response Depth

If you've ever wished you could tell Claude to "think harder" or "be more concise" without switching models, the effort parameter is exactly what you need. This powerful new feature gives you fine-grained control over how many tokens Claude spends on responses—directly influencing speed, cost, and reasoning depth.

In this guide, you'll learn what the effort parameter is, how it differs from the older budget_tokens approach, and exactly how to use it in your API calls. By the end, you'll be able to dial in the perfect balance of capability and efficiency for any task.

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how eager it should be about spending tokens when responding to requests. Instead of setting a hard token budget (which could cut off thinking mid-stream), effort adjusts Claude's natural tendency to think deeply or respond quickly.

Key insight: Effort affects all tokens in the response—not just thinking tokens. This includes:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

This means lowering effort can also reduce the number of tool calls Claude makes, giving you much greater control over overall efficiency.

Supported Models

The effort parameter is available on:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6
Claude Opus 4.5

For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted on these models, it is deprecated and will be removed in a future release.

Effort Levels Explained

Level	Description	Typical Use Case
`max`	Absolute maximum capability, no token constraints	Deepest reasoning, most thorough analysis
`xhigh`	Extended capability for long-horizon work	Long-running agentic/coding tasks (>30 min)
`high`	High capability (default)	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced approach with moderate token savings	Agentic tasks needing speed/cost balance
`low`	Most efficient, significant token savings	Simple tasks, subagents, high-volume chat

Important: Setting effort to "high" produces exactly the same behavior as omitting the parameter entirely.

How to Use the Effort Parameter in API Calls

Basic Usage (Python)

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    effort="low",  # or "medium", "high", "max"
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)
print(response.content[0].text)

Basic Usage (TypeScript)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    effort: 'low',
    messages: [
        { role: 'user', content: 'Explain quantum computing in simple terms.' }
    ]
});
console.log(response.content[0].text);

Combining with Adaptive Thinking

For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This lets Claude dynamically decide whether to think based on the problem complexity, while effort controls how deeply it thinks when it does.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    thinking={"type": "adaptive"},
    effort="medium",
    messages=[
        {"role": "user", "content": "Write a Python script to analyze sales data."}
    ]
)

Using Max Effort for Deep Reasoning

When you need Claude's absolute best reasoning—for complex math proofs, multi-step planning, or thorough code review—use effort: "max".

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    effort="max",
    messages=[
        {"role": "user", "content": "Prove the Riemann Hypothesis..."}
    ]
)

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set effort when using this model:

Medium effort (recommended default): Best balance of speed, cost, and performance. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.
High effort: For tasks requiring deeper reasoning or complex analysis.

Effort vs. budget_tokens: What's the Difference?

Aspect	Effort	budget_tokens (deprecated)
Control type	Behavioral signal	Hard token limit
Affects all tokens?	Yes (text, tools, thinking)	Only thinking tokens
Can skip thinking?	Yes (at lower levels)	No (always thinks up to budget)
Recommended?	Yes	No (deprecated on Opus 4.6/Sonnet 4.6)

Why effort is better:

It's a softer signal, so Claude won't cut off mid-thought when it hits a hard limit.
It affects tool calls, giving you broader cost control.
At lower levels, Claude can skip thinking entirely for simple problems, saving even more tokens.

Practical Tips and Best Practices

1. Start with Medium for Most Tasks

Unless you need maximum reasoning or absolute lowest cost, medium effort offers the sweet spot for most applications.

2. Use Low Effort for Subagents

When building multi-agent systems, set subagents to low effort. They typically handle simpler, well-defined tasks where speed matters more than deep reasoning.

3. Reserve Max for Complex Problems

max effort can significantly increase token usage. Use it only for problems that genuinely require Claude's deepest reasoning capabilities.

4. Combine with Adaptive Thinking

Always pair effort with thinking: {type: "adaptive"} for optimal results. This gives Claude the flexibility to skip thinking when it's not needed.

5. Monitor Token Usage

Lower effort doesn't guarantee a fixed token count—it's a behavioral signal. Always monitor your actual token usage and adjust accordingly.

Common Pitfalls to Avoid

Don't assume low effort = no thinking. On sufficiently difficult problems, Claude will still think—just less than at higher levels.
Don't use budget_tokens on new models. If you're using Opus 4.6 or Sonnet 4.6, switch to effort immediately.
Don't forget to set effort explicitly on Sonnet 4.6. The default is high, which may cause unexpected latency if you're expecting faster responses.

Conclusion

The effort parameter is a game-changer for Claude API users who want fine-grained control over token spend and response depth. By choosing the right effort level for each task, you can optimize for speed, cost, or capability—all with a single model.

Whether you're building a high-volume chat application, a deep-reasoning agent, or anything in between, effort gives you the dial you need to get the best results.

Key Takeaways

Effort controls token spend across all response types—text, tool calls, and thinking—giving you broader cost control than budget_tokens.
Five levels from low to max let you trade off between speed/cost and reasoning depth for any task.
Combine with adaptive thinking (thinking: {type: "adaptive"}) for the best experience.
On Sonnet 4.6, always set effort explicitly to avoid unexpected latency from the default high setting.
Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6—migrate your code to avoid future breakage.