BeClaude
GuideBeginnerPricing2026-05-22

Mastering Claude’s Effort Parameter: Control Token Spend and Response Depth

Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost across all supported models. Includes practical code examples and recommended settings.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn how to set effort levels (low, medium, high, xhigh, max) to trade off between thoroughness and efficiency, with practical API examples and recommended defaults for Sonnet 4.6.

effort parametertoken optimizationClaude APIextended thinkingcost control

Introduction

Claude is incredibly capable, but sometimes you don’t need the full firepower. Whether you’re building a high-volume chat application, a cost-sensitive subagent, or a deep reasoning system, controlling how much effort Claude puts into each response can save you tokens, reduce latency, and still deliver excellent results.

The effort parameter gives you that control. It’s a simple, single-model way to dial Claude’s token spending up or down—without switching models or sacrificing quality when you need it most.

In this guide, you’ll learn:

  • What the effort parameter is and how it works
  • When to use each effort level
  • How to set effort in the API (with code examples)
  • Recommended defaults for Claude Sonnet 4.6
  • How effort compares to the deprecated budget_tokens parameter

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how eagerly it should spend tokens when responding to requests. It affects all tokens in the response—including text explanations, tool calls, and extended thinking (when enabled).

Key points:

  • Available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Opus 4.5.
  • No beta header required.
  • Replaces budget_tokens as the recommended way to control thinking depth (for Opus 4.6 and Sonnet 4.6).
  • Works with or without extended thinking enabled.

Effort Levels and Their Use Cases

LevelDescriptionTypical Use Case
maxAbsolute maximum capability, no token constraintsDeepest reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6)
xhighExtended capability for long-horizon workLong-running agentic/coding tasks (>30 min, token budgets in millions) – Opus 4.7 only
highHigh capability (default)Complex reasoning, difficult coding, agentic tasks
mediumBalanced approach, moderate token savingsAgentic tasks needing speed, cost, and performance balance
lowMost efficient, significant token savingsSimpler tasks, high-volume subagents, latency-sensitive workloads
Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think on sufficiently difficult problems—just less than at higher levels.

How Effort Works Under the Hood

When you set effort to high (or omit the parameter), Claude behaves exactly as it does today—spending as many tokens as needed for excellent results.

  • At max effort, Claude will almost always engage extended thinking, even for simple requests.
  • At low effort, Claude may skip thinking for simpler problems, producing shorter, faster responses.
  • Effort affects tool calls too: lower effort means Claude makes fewer tool calls, saving even more tokens.
This gives you a much greater degree of control over efficiency compared to the old budget_tokens approach, which only limited thinking tokens.

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency and token spend, explicitly set effort when using Sonnet 4.6:

  • medium (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
  • low: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed matters most.

Using Effort in the API

Here’s how to set the effort parameter in both Python and TypeScript.

Python Example

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=8192, system="You are a helpful assistant.", messages=[ {"role": "user", "content": "Explain quantum entanglement in simple terms."} ], # Set effort level effort="medium" )

print(response.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 8192, system: 'You are a helpful assistant.', messages: [ { role: 'user', content: 'Explain quantum entanglement in simple terms.' } ], // Set effort level effort: 'medium' });

console.log(response.content[0].text);

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    effort="high",
    messages=[
        {"role": "user", "content": "Solve this complex math problem step by step."}
    ]
)

Adaptive thinking lets Claude decide when to use extended thinking, while effort controls how much token budget to allocate overall.

Effort vs. budget_tokens (Deprecated)

For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Aspecteffortbudget_tokens (deprecated)
ScopeAffects all tokens (text, tools, thinking)Only affects thinking tokens
GranularityBehavioral levels (low/medium/high/max)Exact token budget
SimplicityEasy to tuneRequires experimentation
Future-proofYesWill be removed

Best Practices

  • Start with medium for Sonnet 4.6 – It’s the sweet spot for most applications.
  • Use low for high-volume subagents – When you have many parallel agents doing simple tasks, low saves tokens and reduces latency.
  • Reserve max for critical deep reasoning – Use it only when you need the absolute best answer, like complex analysis or debugging.
  • Combine with adaptive thinking – For models that support it, adaptive thinking + effort gives you the best of both worlds.
  • Monitor token usage – Effort is a signal, not a hard limit. Always monitor actual token spend in production.

Key Takeaways

  • The effort parameter lets you control Claude’s token spending across all response types (text, tools, thinking).
  • Effort levels range from low (fastest, cheapest) to max (most thorough), with high as the default.
  • For Sonnet 4.6, explicitly set effort to medium as a recommended default to balance speed, cost, and performance.
  • Effort replaces budget_tokens for Opus 4.6 and Sonnet 4.6, and works without extended thinking enabled.
  • Combine effort with adaptive thinking for the best results on supported models.