BeClaude
Guide2026-05-05

Mastering Claude's Effort Parameter: Optimize Token Usage for Speed, Cost, and Capability

Learn how to control Claude's thinking depth with the effort parameter. Balance response thoroughness and token efficiency across models like Opus 4.6 and Sonnet 4.6.

Quick Answer

This guide explains how to use Claude's effort parameter to control token spending, trading off between response thoroughness and efficiency. You'll learn effort levels, recommended settings for Sonnet 4.6, and how to combine effort with adaptive thinking.

effort parametertoken optimizationClaude APIcost efficiencyadaptive thinking

Introduction

When building applications with Claude, you often face a trade-off: do you want the deepest possible reasoning, or do you need fast, cost-effective responses? Traditionally, you'd switch between models to balance these needs. But with the effort parameter, you can now control this behavior within a single model.

Effort lets you tell Claude how eager it should be about spending tokens when responding. Think of it as a dial: turn it up for maximum capability on complex problems, or turn it down for speed and savings on simpler tasks. This guide covers everything you need to know to use effort effectively.

What Is the Effort Parameter?

The effort parameter is a behavioral signal that influences how thoroughly Claude processes your request. It affects all tokens in the response—including text, tool calls, and extended thinking (when enabled). This is a key advantage over older methods like budget_tokens, which only controlled thinking tokens.

Key benefits:
  • Works without enabling extended thinking
  • Affects tool call frequency (lower effort = fewer tool calls)
  • Single model, multiple behavior profiles

Supported Models

Effort is generally available on:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
  • Claude Opus 4.5
For Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it's deprecated and will be removed in a future release.

Effort Levels Explained

LevelDescriptionTypical Use Case
maxAbsolute maximum capability, no token constraintsDeepest reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6)
xhighExtended capability for long-horizon workLong-running agentic/coding tasks over 30 minutes (Opus 4.7 only)
highHigh capability (default)Complex reasoning, difficult coding, agentic tasks
mediumBalanced approach with moderate token savingsAgentic tasks needing speed/cost balance
lowMost efficient, significant token savingsSimpler tasks, subagents, high-volume workloads
Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think deeply on sufficiently difficult problems—but it will think less than it would at higher levels for the same problem.

Recommended Settings for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, always set effort explicitly:

  • Medium (recommended default): Best balance of speed, cost, and performance. Ideal for agentic coding, tool-heavy workflows, and code generation.
  • Low: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases.
  • High: For tasks requiring maximum capability.

How to Use Effort in the API

Python Example

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=8192, system="You are a helpful assistant.", messages=[ {"role": "user", "content": "Explain quantum computing in simple terms."} ], # Set effort level extra_headers={ "anthropic-effort": "medium" } )

print(response.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 8192, system: 'You are a helpful assistant.', messages: [ { role: 'user', content: 'Explain quantum computing in simple terms.' } ], extra_headers: { 'anthropic-effort': 'medium' } });

console.log(response.content[0].text);

With Extended Thinking (Adaptive)

For the best experience, combine effort with adaptive thinking:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    messages=[
        {"role": "user", "content": "Solve this complex math problem step by step."}
    ],
    extra_headers={
        "anthropic-effort": "high"
    }
)

Practical Use Cases

1. Cost-Sensitive Production Apps

Use low effort for high-volume customer support chatbots where most queries are simple. You'll save tokens and reduce latency while maintaining acceptable quality.

2. Multi-Agent Systems

Assign different effort levels to different agents:

  • Coordinator agent: low effort (fast routing decisions)
  • Research agent: high effort (deep analysis)
  • Code generation agent: medium effort (balanced)

3. Tiered User Experience

Offer users a choice:

  • Quick mode: low effort (free tier)
  • Balanced mode: medium effort (standard tier)
  • Deep mode: high or max effort (premium tier)

Best Practices

  • Always set effort explicitly for Sonnet 4.6 to avoid unexpected latency from the default high setting.
  • Combine with adaptive thinking (thinking: {type: "adaptive"}) for optimal token usage.
  • Test different levels on your specific use case—the token savings vary by task complexity.
  • Monitor token usage to quantify savings and adjust effort levels accordingly.
  • Use max sparingly—it's designed for the most demanding tasks and will consume more tokens.

Key Takeaways

  • Effort controls token spending across all response types—text, tool calls, and thinking—giving you fine-grained control over cost and speed.
  • Medium effort is the recommended default for Sonnet 4.6, balancing performance and efficiency for most applications.
  • Combine effort with adaptive thinking for the best results, especially on complex tasks.
  • Lower effort levels still allow deep thinking on hard problems—Claude adapts its behavior based on task difficulty.
  • Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6, so migrate your code to use the new parameter.