BeClaude
GuideBeginnerPricing2026-05-22

Mastering Claude’s Effort Parameter: Smarter Token Control for Cost & Speed

Learn how to use Claude's effort parameter to control thinking depth, reduce token spend, and optimize API costs across Opus and Sonnet models.

Quick Answer

This guide explains how Claude’s effort parameter lets you trade off between response thoroughness and token efficiency. You’ll learn each effort level (low, medium, high, xhigh, max), when to use them, and how to implement them in Python and TypeScript for real-world savings.

effort parametertoken optimizationClaude APIextended thinkingcost control

Mastering Claude’s Effort Parameter: Smarter Token Control for Cost & Speed

Claude is incredibly capable, but that capability comes with a cost—both in latency and token spend. If you’ve ever wished you could dial down Claude’s thinking on simple tasks or crank it up for deep reasoning, the effort parameter is exactly what you need.

Introduced alongside Claude Opus 4.6 and Sonnet 4.6, effort gives you fine-grained control over how eagerly Claude spends tokens. It replaces the older budget_tokens parameter and works across all supported models—even without enabling extended thinking.

In this guide, you’ll learn:

  • What the effort parameter is and how it works
  • Each effort level and when to use it
  • How to implement effort in Python and TypeScript
  • Best practices for balancing speed, cost, and quality
Let’s dive in.

---

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how much token spend is appropriate for a given request. It affects all tokens in the response—text, tool calls, and extended thinking.

Key advantages:

  • No thinking required – You can use effort without enabling extended_thinking.
  • Controls tool calls – Lower effort means fewer tool calls, saving even more tokens.
  • Single model, multiple modes – No need to switch between different Claude models for different complexity levels.
Note: Effort is a signal, not a strict budget. Claude may still think deeply on hard problems even at low effort—but it will think less than it would at higher levels.

---

Effort Levels Explained

LevelDescriptionBest For
lowMost efficient. Significant token savings with some capability reduction.Simple tasks, high-volume chat, subagents, latency-sensitive workloads.
mediumBalanced approach with moderate token savings.Agentic tasks needing a mix of speed, cost, and performance. Recommended default for Sonnet 4.6.
highHigh capability. Equivalent to omitting the parameter.Complex reasoning, difficult coding, agentic tasks. Default behavior.
xhighExtended capability for long-horizon work.Long-running agentic and coding tasks (30+ min) with token budgets in the millions. Available on Opus 4.7 only.
maxAbsolute maximum capability with no constraints.Deepest reasoning and most thorough analysis. Available on Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6.
---

When to Use Each Level

Low Effort – Speed & Cost First

Use low when you need fast responses and can tolerate some quality loss. Great for:
  • Simple Q&A or classification
  • High-throughput chatbots
  • Subagents that handle trivial subtasks

Medium Effort – The Sweet Spot

For most production applications, medium offers the best balance. It’s the recommended default for Sonnet 4.6 to avoid unexpected latency. Use it for:
  • Agentic coding workflows
  • Tool-heavy applications
  • Code generation

High Effort – Default Power

If you don’t set effort, Claude defaults to high. This is ideal for:
  • Complex reasoning
  • Difficult debugging
  • Tasks where quality is more important than speed

X-High & Max Effort – Maximum Depth

Reserve these for your hardest problems. xhigh (Opus 4.7 only) is built for long-running agents. max is available on most recent models and should be used sparingly due to high token consumption.

---

How to Use Effort in the API

Python Example

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, effort="medium", # or "low", "high", "xhigh", "max" messages=[ {"role": "user", "content": "Explain quantum entanglement in simple terms."} ] )

print(response.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 4096, effort: 'medium', messages: [ { role: 'user', content: 'Explain quantum entanglement in simple terms.' } ] });

console.log(response.content[0].text);

---

Combining Effort with Adaptive Thinking

For the best experience, pair effort with adaptive thinking (thinking: {type: "adaptive"}). This lets Claude dynamically decide how much thinking to apply based on the problem, while effort sets the overall ceiling.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    effort="high",
    messages=[
        {"role": "user", "content": "Design a distributed caching system."}
    ]
)
Note: On Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

---

Best Practices

  • Start with medium for Sonnet 4.6 – It avoids unexpected latency while maintaining strong performance.
  • Use low for high-volume or latency-sensitive endpoints – Especially for chat or simple classification.
  • Reserve max for your hardest problems – It’s powerful but expensive.
  • Combine with adaptive thinking – This gives Claude the flexibility to think only when needed.
  • Monitor token usage – Effort is a signal, not a hard limit. Always track actual spend.
---

Key Takeaways

  • Effort controls token spend across text, tool calls, and thinking – no need to enable extended thinking to use it.
  • Five levels available: low, medium, high, xhigh (Opus 4.7 only), and max.
  • Medium is the recommended default for Sonnet 4.6 to balance speed, cost, and quality.
  • Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6 – migrate your code now.
  • Combine with adaptive thinking for the most efficient and capable experience.