GuideBeginnerPricing2026-05-15

Mastering the Effort Parameter: Control Claude’s Token Spending for Speed, Cost, and Quality

Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost. Includes practical API examples, effort levels, and best practices for Sonnet 4.6 and Opus 4.7.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn how to set effort levels from low to max, see code examples for Python and TypeScript, and get recommendations for balancing speed, cost, and capability.

effort parametertoken efficiencyClaude APIextended thinkingcost optimization

Introduction

Every Claude API call comes with a trade-off: more tokens spent usually means better results, but it also means higher latency and cost. What if you could dial that trade-off up or down with a single parameter—without switching models?

That’s exactly what the effort parameter does. Available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Opus 4.5, effort lets you control how eagerly Claude spends tokens when responding. Whether you need maximum reasoning depth or lightning-fast answers for simple tasks, effort gives you fine-grained control.

In this guide, you’ll learn:

How effort works and how it differs from budget_tokens
The five effort levels and when to use each
Practical API code examples (Python and TypeScript)
Best practices for Sonnet 4.6 and Opus 4.7
How to combine effort with adaptive thinking

Let’s dive in.

---

How the Effort Parameter Works

By default, Claude uses high effort—spending as many tokens as needed for excellent results. The effort parameter lets you move up or down from that baseline:

Raise effort to max for the absolute highest capability (deeper reasoning, more thorough analysis).
Lower effort to medium or low to save tokens, reduce latency, and cut costs—accepting some reduction in capability.

Setting effort to "high" produces exactly the same behavior as omitting the parameter entirely.

What Effort Affects

Effort influences all tokens in the response, including:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

This is a major advantage over older approaches like budget_tokens:

No thinking required – You can use effort even without enabling extended thinking.
Tool calls included – Lower effort means Claude makes fewer tool calls, giving you broader control over efficiency.

Effort vs. budget_tokens

For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted on those models, it is deprecated and will be removed in a future release.

Important: Combine effort with adaptive thinking (thinking: {type: "adaptive"}) for the best experience. Adaptive thinking lets Claude dynamically decide how much to think based on the problem, while effort sets the overall eagerness level.

---

Effort Levels and Use Cases

Level	Description	Typical Use Case
`max`	Absolute maximum capability with no constraints on token spending. Available on Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.	Tasks requiring the deepest possible reasoning and most thorough analysis (e.g., complex scientific research, multi-step proofs).
`xhigh`	Extended capability for long-horizon work. Available only on Opus 4.7.	Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions.
`high`	High capability. Equivalent to not setting the parameter.	Complex reasoning, difficult coding problems, agentic tasks.
`medium`	Balanced approach with moderate token savings.	Agentic tasks that need a balance of speed, cost, and performance.
`low`	Most efficient. Significant token savings with some capability reduction.	Simpler tasks needing the best speed and lowest costs, such as subagents or high-volume chat.

Important: Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think on sufficiently difficult problems—but it will think less than it would at higher effort levels for the same problem.

---

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. If you don’t set the parameter explicitly, you may experience higher latency than expected. Anthropic recommends:

Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.

---

API Code Examples

Python

import anthropic
client = anthropic.Anthropic()
Low effort – fast, cost-effective
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    thinking={"type": "adaptive"},
    effort="low"
)
print(response.content[0].text)

# Max effort – deepest reasoning
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    system="You are a research assistant.",
    messages=[
        {"role": "user", "content": "Prove the Riemann Hypothesis."}
    ],
    thinking={"type": "adaptive"},
    effort="max"
)
print(response.content[0].text)

TypeScript

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Medium effort – balanced
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  system: 'You are a helpful assistant.',
  messages: [
    { role: 'user', content: 'Explain quantum computing in simple terms.' }
  ],
  thinking: { type: 'adaptive' },
  effort: 'medium'
});
console.log(response.content[0].text);

// Low effort – high volume chat
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 512,
  system: 'You are a customer support agent.',
  messages: [
    { role: 'user', content: 'How do I reset my password?' }
  ],
  thinking: { type: 'adaptive' },
  effort: 'low'
});
console.log(response.content[0].text);

---

Best Practices

1. Always Set Effort Explicitly for Sonnet 4.6

Sonnet 4.6 defaults to high effort. If you don’t set it, you may get higher latency than expected. Make it a habit to always include effort in your requests.

2. Combine with Adaptive Thinking

For the best results, always pair effort with thinking: {type: "adaptive"}. This lets Claude dynamically decide how much to think based on the problem complexity, while effort sets the overall eagerness level.

3. Start with Medium, Tune from There

If you’re unsure which level to use, start with medium. It provides a good balance of speed, cost, and performance. Then adjust up or down based on your specific needs.

4. Use Low Effort for Subagents and Simple Tasks

If you’re building multi-agent systems or handling high-volume chat, low effort can dramatically reduce costs and latency while still maintaining acceptable quality for straightforward tasks.

5. Reserve Max for the Hardest Problems

max effort is powerful but expensive. Use it only for tasks that genuinely require the deepest possible reasoning—complex scientific analysis, multi-step proofs, or high-stakes decision-making.

---

Frequently Asked Questions

Does effort work without extended thinking?

Yes! Effort affects all tokens, including text responses and tool calls, even when thinking is not enabled.

Can I use effort with older models?

Effort is supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Opus 4.5. Older models do not support this parameter.

Is effort a hard token limit?

No. Effort is a behavioral signal, not a strict budget. At lower levels, Claude will still think deeply on hard problems, but it will think less than it would at higher levels.

What happened to budget_tokens?

budget_tokens is deprecated on Opus 4.6 and Sonnet 4.6 and will be removed in a future release. Use effort instead.

---

Key Takeaways

Effort gives you fine-grained control over Claude’s token spending, affecting text, tool calls, and extended thinking.
Five levels are available: low, medium, high, xhigh (Opus 4.7 only), and max.
Always combine effort with adaptive thinking (thinking: {type: "adaptive"}) for the best experience.
Sonnet 4.6 defaults to high effort – set it explicitly to avoid unexpected latency.
Start with medium for most use cases, then tune based on your speed, cost, and quality requirements.