GuideBeginnerPricing2026-05-22

Mastering Claude’s Effort Parameter: Smarter Token Control for Cost & Speed

Learn how to use Claude's effort parameter to control thinking depth, reduce token spend, and optimize API costs across Opus and Sonnet models.

Quick Answer

This guide explains how Claude’s effort parameter lets you trade off between response thoroughness and token efficiency. You’ll learn each effort level (low, medium, high, xhigh, max), when to use them, and how to implement them in Python and TypeScript for real-world savings.

effort parametertoken optimizationClaude APIextended thinkingcost control

Mastering Claude’s Effort Parameter: Smarter Token Control for Cost & Speed

Claude is incredibly capable, but that capability comes with a cost—both in latency and token spend. If you’ve ever wished you could dial down Claude’s thinking on simple tasks or crank it up for deep reasoning, the effort parameter is exactly what you need.

Introduced alongside Claude Opus 4.6 and Sonnet 4.6, effort gives you fine-grained control over how eagerly Claude spends tokens. It replaces the older budget_tokens parameter and works across all supported models—even without enabling extended thinking.

In this guide, you’ll learn:

What the effort parameter is and how it works
Each effort level and when to use it
How to implement effort in Python and TypeScript
Best practices for balancing speed, cost, and quality

Let’s dive in.

---

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how much token spend is appropriate for a given request. It affects all tokens in the response—text, tool calls, and extended thinking.

Key advantages:

No thinking required – You can use effort without enabling extended_thinking.
Controls tool calls – Lower effort means fewer tool calls, saving even more tokens.
Single model, multiple modes – No need to switch between different Claude models for different complexity levels.

Note: Effort is a signal, not a strict budget. Claude may still think deeply on hard problems even at low effort—but it will think less than it would at higher levels.

---

Effort Levels Explained

Level	Description	Best For
`low`	Most efficient. Significant token savings with some capability reduction.	Simple tasks, high-volume chat, subagents, latency-sensitive workloads.
`medium`	Balanced approach with moderate token savings.	Agentic tasks needing a mix of speed, cost, and performance. Recommended default for Sonnet 4.6.
`high`	High capability. Equivalent to omitting the parameter.	Complex reasoning, difficult coding, agentic tasks. Default behavior.
`xhigh`	Extended capability for long-horizon work.	Long-running agentic and coding tasks (30+ min) with token budgets in the millions. Available on Opus 4.7 only.
`max`	Absolute maximum capability with no constraints.	Deepest reasoning and most thorough analysis. Available on Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6.

---

When to Use Each Level

Low Effort – Speed & Cost First

Use low when you need fast responses and can tolerate some quality loss. Great for:

Simple Q&A or classification
High-throughput chatbots
Subagents that handle trivial subtasks

Medium Effort – The Sweet Spot

For most production applications, medium offers the best balance. It’s the recommended default for Sonnet 4.6 to avoid unexpected latency. Use it for:

Agentic coding workflows
Tool-heavy applications
Code generation

High Effort – Default Power

If you don’t set effort, Claude defaults to high. This is ideal for:

Complex reasoning
Difficult debugging
Tasks where quality is more important than speed

X-High & Max Effort – Maximum Depth

Reserve these for your hardest problems. xhigh (Opus 4.7 only) is built for long-running agents. max is available on most recent models and should be used sparingly due to high token consumption.

---

How to Use Effort in the API

Python Example

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    effort="medium",  # or "low", "high", "xhigh", "max"
    messages=[
        {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ]
)
print(response.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  effort: 'medium',
  messages: [
    { role: 'user', content: 'Explain quantum entanglement in simple terms.' }
  ]
});
console.log(response.content[0].text);

---

Combining Effort with Adaptive Thinking

For the best experience, pair effort with adaptive thinking (thinking: {type: "adaptive"}). This lets Claude dynamically decide how much thinking to apply based on the problem, while effort sets the overall ceiling.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    effort="high",
    messages=[
        {"role": "user", "content": "Design a distributed caching system."}
    ]
)

Note: On Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

---

Best Practices

Start with medium for Sonnet 4.6 – It avoids unexpected latency while maintaining strong performance.
Use low for high-volume or latency-sensitive endpoints – Especially for chat or simple classification.
Reserve max for your hardest problems – It’s powerful but expensive.
Combine with adaptive thinking – This gives Claude the flexibility to think only when needed.
Monitor token usage – Effort is a signal, not a hard limit. Always track actual spend.

---

Key Takeaways

Effort controls token spend across text, tool calls, and thinking – no need to enable extended thinking to use it.
Five levels available: low, medium, high, xhigh (Opus 4.7 only), and max.
Medium is the recommended default for Sonnet 4.6 to balance speed, cost, and quality.
Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6 – migrate your code now.
Combine with adaptive thinking for the most efficient and capable experience.