GuideBeginnerPricing2026-05-22

Mastering Claude's Effort Parameter: Control Thinking Depth and Token Spend

Learn how to use Claude's effort parameter to control response thoroughness, reduce token costs, and optimize latency across all models including Opus 4.7 and Sonnet 4.6.

Quick Answer

This guide explains how to use Claude's effort parameter to control how eagerly the model spends tokens, from max (deepest reasoning) to low (fastest, cheapest). You'll learn effort levels, recommended defaults for Sonnet 4.6, code examples, and how it replaces budget_tokens.

effort parametertoken optimizationextended thinkingClaude APIcost control

Introduction

Claude's effort parameter gives you fine-grained control over how thoroughly the model thinks before responding. By adjusting effort, you can trade off between response quality and token efficiency—all with a single model call. This feature is available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Opus 4.5.

Whether you're building a high-volume chat application that needs low latency, or a deep reasoning agent that requires maximum capability, the effort parameter lets you dial in the perfect balance.

How Effort Works

By default, Claude uses high effort—spending as many tokens as needed for excellent results. You can raise it to max for absolute highest capability, or lower it to medium or low to be more conservative with token usage.

The effort parameter affects all tokens in the response, including:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

This is a major advantage over older approaches like budget_tokens because:

It doesn't require thinking to be enabled
It can affect all token spend, including tool calls (lower effort means fewer tool calls)

Important: Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think on sufficiently difficult problems, but it will think less than it would at higher levels.

Effort Levels Explained

Level	Description	Typical Use Case
`max`	Absolute maximum capability, no constraints on token spending	Deepest reasoning, most thorough analysis (Opus 4.7, Opus 4.6, Sonnet 4.6, Mythos Preview)
`xhigh`	Extended capability for long-horizon work	Long-running agentic/coding tasks over 30 minutes (Opus 4.7 only)
`high`	High capability (default)	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced approach with moderate token savings	Agentic tasks needing balance of speed, cost, and performance
`low`	Most efficient, significant token savings	Simple tasks, subagents, high-volume chat

Recommended Defaults for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set effort:

Medium (recommended default): Best balance for most applications—agentic coding, tool-heavy workflows, code generation.
Low: For high-volume or latency-sensitive workloads—chat and non-coding use cases.

Code Examples

Python (with Anthropic SDK)

import anthropic
client = anthropic.Anthropic()
Low effort - fast and cheap
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    extra_headers={"anthropic-effort": "low"}
)
Medium effort - balanced
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    system="You are a coding assistant.",
    messages=[{"role": "user", "content": "Write a Python function to sort a list of dictionaries by a key."}],
    extra_headers={"anthropic-effort": "medium"}
)
Max effort - deepest reasoning
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    system="You are a research assistant.",
    messages=[{"role": "user", "content": "Analyze the implications of quantum computing on cryptography."}],
    extra_headers={"anthropic-effort": "max"}
)

TypeScript (with Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Low effort
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  system: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'What is the capital of France?' }],
  extra_headers: { 'anthropic-effort': 'low' }
});
// Medium effort (recommended default for Sonnet 4.6)
const response2 = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 2048,
  system: 'You are a coding assistant.',
  messages: [{ role: 'user', content: 'Write a Python function to sort a list of dictionaries by a key.' }],
  extra_headers: { 'anthropic-effort': 'medium' }
});
// Max effort
const response3 = await client.messages.create({
  model: 'claude-opus-4-20250514',
  max_tokens: 4096,
  system: 'You are a research assistant.',
  messages: [{ role: 'user', content: 'Analyze the implications of quantum computing on cryptography.' }],
  extra_headers: { 'anthropic-effort': 'max' }
});

Using with Extended Thinking

Combine effort with adaptive thinking for the best experience:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    system="You are a research assistant.",
    messages=[{"role": "user", "content": "Solve this complex math problem step by step."}],
    extra_headers={"anthropic-effort": "high"}
)

Effort vs. budget_tokens

For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Aspect	effort	budget_tokens
Scope	Affects all tokens (text, tools, thinking)	Only affects thinking tokens
Precision	Behavioral signal (not strict)	Strict token budget
Simplicity	5 levels (low/medium/high/xhigh/max)	Requires numeric value
Future-proof	✅ Recommended	❌ Deprecated

Best Practices

Start with medium for Sonnet 4.6 – Explicitly set effort to avoid unexpected latency.
Use low for high-volume chat – When speed and cost matter most, and tasks are simple.
Use max for complex reasoning – When you need Claude's deepest thinking (e.g., mathematical proofs, multi-step analysis).
Combine with adaptive thinking – For the best balance of depth and efficiency.
Test different levels – Run benchmarks with your specific use case to find the optimal effort level.

Limitations

Not a strict budget: At lower effort levels, Claude may still think deeply on very difficult problems.
Model availability: max is available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6. xhigh is only available on Opus 4.7.
No beta header required: The effort parameter works on all supported models without special headers.

Key Takeaways

Effort controls token spend across all response types – text, tool calls, and extended thinking – giving you broad control over cost and latency.
Five levels from low to max let you dial in the perfect balance for your use case, from simple chat to deep reasoning.
Explicitly set effort for Sonnet 4.6 – the default is high, which may be more than you need. Medium is recommended for most applications.
Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6 – migrate your code to use the new parameter before budget_tokens is removed.
Combine with adaptive thinking for the best experience, allowing Claude to decide when to think deeply while respecting your effort preference.