BeClaude
GuideBeginnerPricing2026-05-20

Mastering Claude's Effort Parameter: Control Token Spend and Response Depth

Learn how to use Claude's effort parameter to balance response thoroughness and token efficiency across API calls, with practical code examples and recommended settings.

Quick Answer

Claude's effort parameter lets you control how eagerly the model spends tokens on responses. Set it to 'low' for fast, cheap answers on simple tasks, 'medium' for balanced performance, 'high' for complex reasoning, or 'max' for the deepest possible analysis. It works across all response types including text, tool calls, and extended thinking.

effort parametertoken optimizationClaude APIextended thinkingcost control

Introduction

Claude is incredibly powerful, but with great power comes... greater token consumption. If you've ever wished you could dial Claude's thoroughness up or down depending on the task, the effort parameter is exactly what you need. Introduced in the Claude API, this parameter gives you fine-grained control over how many tokens Claude spends on each response—without switching models.

Whether you're building a high-volume chat application that needs lightning-fast replies, or an agentic system that requires deep reasoning over millions of tokens, the effort parameter lets you optimize for speed, cost, or capability—all with a single model.

In this guide, you'll learn:

  • What the effort parameter is and how it works
  • The five effort levels and when to use each
  • How to combine effort with adaptive thinking
  • Practical code examples in Python and TypeScript
  • Best practices for different use cases

How the Effort Parameter Works

By default, Claude operates at high effort—spending as many tokens as needed to produce excellent results. The effort parameter lets you adjust this behavior:

  • Raise effort to max for the absolute highest capability on the hardest problems.
  • Lower effort to medium or low to be more conservative with token usage, optimizing for speed and cost.
Crucially, effort affects all tokens in the response—not just thinking tokens. This includes:
  • Text responses and explanations
  • Tool calls and function arguments
  • Extended thinking (when enabled)
This is a major advantage over the older budget_tokens parameter (now deprecated on Opus 4.6 and Sonnet 4.6). Effort gives you a single dial to control overall token spend, including tool call frequency. At lower effort levels, Claude will make fewer tool calls and provide shorter explanations.
Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think deeply on sufficiently difficult problems—but it will think less than it would at higher levels for the same problem.

Effort Levels and Use Cases

LevelDescriptionTypical Use Case
maxAbsolute maximum capability, no constraints on token spendingDeepest reasoning, most thorough analysis (Claude Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6)
xhighExtended capability for long-horizon workLong-running agentic and coding tasks over 30 minutes with token budgets in the millions (Opus 4.7 only)
highHigh capability (default behavior)Complex reasoning, difficult coding, agentic tasks
mediumBalanced approach with moderate token savingsAgentic tasks needing a balance of speed, cost, and performance
lowMost efficient, significant token savingsSimple tasks, high-volume chat, subagents where speed and cost matter most

Recommended Defaults for Sonnet 4.6

Sonnet 4.6 defaults to high effort. For most applications, explicitly set the effort level to avoid unexpected latency:

  • Medium effort (recommended default): Best balance of speed, cost, and performance for agentic coding, tool-heavy workflows, and code generation.
  • Low effort: For high-volume or latency-sensitive workloads like chat and non-coding use cases.

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This allows Claude to dynamically decide how much thinking to apply based on the problem complexity, while the effort parameter sets the overall behavioral context.

# Python example: effort + adaptive thinking
import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=8192, thinking={"type": "adaptive"}, effort="medium", # or "low", "high", "max" messages=[ {"role": "user", "content": "Write a Python function to merge two sorted lists."} ] )

print(response.content)

// TypeScript example: effort + adaptive thinking
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 8192, thinking: { type: 'adaptive' }, effort: 'medium', messages: [ { role: 'user', content: 'Write a Python function to merge two sorted lists.' } ] });

console.log(response.content);

Practical Examples

Example 1: Low Effort for Simple Chat

For a customer support chatbot handling common questions, low effort keeps responses fast and cheap:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    effort="low",
    messages=[
        {"role": "user", "content": "What are your business hours?"}
    ]
)

Example 2: Medium Effort for Agentic Coding

For a coding assistant that needs to balance thoroughness with response time:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    effort="medium",
    tools=[
        {
            "name": "edit_file",
            "description": "Edit a file in the codebase",
            "input_schema": {
                "type": "object",
                "properties": {
                    "file_path": {"type": "string"},
                    "content": {"type": "string"}
                },
                "required": ["file_path", "content"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "Add input validation to the user registration endpoint."}
    ]
)

Example 3: Max Effort for Deep Reasoning

For complex mathematical proofs or multi-step analysis:

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=16384,
    thinking={"type": "adaptive"},
    effort="max",
    messages=[
        {"role": "user", "content": "Prove that the square root of 2 is irrational."}
    ]
)

Best Practices

  • Start with medium effort for most applications. It provides a strong balance of capability and efficiency.
  • Use low effort for subagents in multi-agent systems where each subagent handles simple, well-defined tasks.
  • Reserve max effort for the most challenging problems where you need Claude's absolute best reasoning.
  • Combine with adaptive thinking to let Claude dynamically allocate thinking tokens based on problem difficulty.
  • Monitor token usage across effort levels to find the sweet spot for your specific workload. Lower effort doesn't just reduce thinking tokens—it reduces all tokens, including tool calls.

Model Support

The effort parameter is available on:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
  • Claude Opus 4.5
No beta header is required. For Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth.

Key Takeaways

  • The effort parameter controls overall token spend across text, tool calls, and extended thinking—not just thinking tokens.
  • Five levels are available: low, medium, high (default), xhigh (Opus 4.7 only), and max.
  • Combine with adaptive thinking (thinking: {type: "adaptive"}) for the best experience on supported models.
  • Lower effort reduces tool call frequency, making it ideal for high-volume or latency-sensitive applications.
  • Effort is a behavioral signal, not a strict budget—Claude will still think deeply on hard problems even at lower levels.