GuideBeginnerPricing2026-05-22

Mastering Claude’s Effort Parameter: Control Token Spend and Response Depth

Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost across all supported models. Includes practical code examples and recommended settings.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn how to set effort levels (low, medium, high, xhigh, max) to trade off between thoroughness and efficiency, with practical API examples and recommended defaults for Sonnet 4.6.

effort parametertoken optimizationClaude APIextended thinkingcost control

Introduction

Claude is incredibly capable, but sometimes you don’t need the full firepower. Whether you’re building a high-volume chat application, a cost-sensitive subagent, or a deep reasoning system, controlling how much effort Claude puts into each response can save you tokens, reduce latency, and still deliver excellent results.

The effort parameter gives you that control. It’s a simple, single-model way to dial Claude’s token spending up or down—without switching models or sacrificing quality when you need it most.

In this guide, you’ll learn:

What the effort parameter is and how it works
When to use each effort level
How to set effort in the API (with code examples)
Recommended defaults for Claude Sonnet 4.6
How effort compares to the deprecated budget_tokens parameter

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how eagerly it should spend tokens when responding to requests. It affects all tokens in the response—including text explanations, tool calls, and extended thinking (when enabled).

Key points:

Available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Opus 4.5.
No beta header required.
Replaces budget_tokens as the recommended way to control thinking depth (for Opus 4.6 and Sonnet 4.6).
Works with or without extended thinking enabled.

Effort Levels and Their Use Cases

Level	Description	Typical Use Case
`max`	Absolute maximum capability, no token constraints	Deepest reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6)
`xhigh`	Extended capability for long-horizon work	Long-running agentic/coding tasks (>30 min, token budgets in millions) – Opus 4.7 only
`high`	High capability (default)	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced approach, moderate token savings	Agentic tasks needing speed, cost, and performance balance
`low`	Most efficient, significant token savings	Simpler tasks, high-volume subagents, latency-sensitive workloads

Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think on sufficiently difficult problems—just less than at higher levels.

How Effort Works Under the Hood

When you set effort to high (or omit the parameter), Claude behaves exactly as it does today—spending as many tokens as needed for excellent results.

At max effort, Claude will almost always engage extended thinking, even for simple requests.
At low effort, Claude may skip thinking for simpler problems, producing shorter, faster responses.
Effort affects tool calls too: lower effort means Claude makes fewer tool calls, saving even more tokens.

This gives you a much greater degree of control over efficiency compared to the old budget_tokens approach, which only limited thinking tokens.

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency and token spend, explicitly set effort when using Sonnet 4.6:

medium (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
low: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed matters most.

Using Effort in the API

Here’s how to set the effort parameter in both Python and TypeScript.

Python Example

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ],
    # Set effort level
    effort="medium"
)
print(response.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 8192,
  system: 'You are a helpful assistant.',
  messages: [
    { role: 'user', content: 'Explain quantum entanglement in simple terms.' }
  ],
  // Set effort level
  effort: 'medium'
});
console.log(response.content[0].text);

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    effort="high",
    messages=[
        {"role": "user", "content": "Solve this complex math problem step by step."}
    ]
)

Adaptive thinking lets Claude decide when to use extended thinking, while effort controls how much token budget to allocate overall.

Effort vs. budget_tokens (Deprecated)

For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Aspect	`effort`	`budget_tokens` (deprecated)
Scope	Affects all tokens (text, tools, thinking)	Only affects thinking tokens
Granularity	Behavioral levels (low/medium/high/max)	Exact token budget
Simplicity	Easy to tune	Requires experimentation
Future-proof	Yes	Will be removed

Best Practices

Start with medium for Sonnet 4.6 – It’s the sweet spot for most applications.
Use low for high-volume subagents – When you have many parallel agents doing simple tasks, low saves tokens and reduces latency.
Reserve max for critical deep reasoning – Use it only when you need the absolute best answer, like complex analysis or debugging.
Combine with adaptive thinking – For models that support it, adaptive thinking + effort gives you the best of both worlds.
Monitor token usage – Effort is a signal, not a hard limit. Always monitor actual token spend in production.

Key Takeaways

The effort parameter lets you control Claude’s token spending across all response types (text, tools, thinking).
Effort levels range from low (fastest, cheapest) to max (most thorough), with high as the default.
For Sonnet 4.6, explicitly set effort to medium as a recommended default to balance speed, cost, and performance.
Effort replaces budget_tokens for Opus 4.6 and Sonnet 4.6, and works without extended thinking enabled.
Combine effort with adaptive thinking for the best results on supported models.