BeClaude
Guide2026-05-05

Mastering Claude's Effort Parameter: Control Token Spend and Response Depth

Learn how to use Claude's effort parameter to control token usage, response thoroughness, and cost. Includes code examples, effort levels, and best practices for Sonnet 4.6 and Opus 4.6.

Quick Answer

Claude's effort parameter lets you control how eagerly Claude spends tokens on responses, from max (deepest reasoning) to low (fastest, cheapest). It works with or without extended thinking and replaces budget_tokens on Opus 4.6 and Sonnet 4.6.

effort parametertoken efficiencyClaude APIcost optimizationextended thinking

Mastering Claude's Effort Parameter: Control Token Spend and Response Depth

If you've ever wished you could dial Claude's thinking up or down depending on the task, your wish has been granted. The effort parameter gives you fine-grained control over how many tokens Claude spends on a response—without switching models. This guide explains everything you need to know to use effort effectively, with practical code examples and best practices.

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how eager it should be about spending tokens when responding. By default, Claude uses high effort, spending as many tokens as needed for excellent results. You can raise it to max for the absolute highest capability, or lower it to low for maximum speed and cost savings.

Key advantages of the effort parameter:

  • No thinking required – Works with or without extended thinking enabled
  • Affects all tokens – Controls text, tool calls, and thinking tokens
  • Single model – No need to switch between different Claude models for different depth levels

Supported Models

The effort parameter is generally available on:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
  • Claude Opus 4.5
For Opus 4.6 and Sonnet 4.6, effort replaces the deprecated budget_tokens parameter as the recommended way to control thinking depth.

Effort Levels Explained

LevelDescriptionBest Use Case
maxAbsolute maximum capability, no token constraintsDeepest reasoning, complex research, multi-step analysis
xhighExtended capability for long-horizon work (Opus 4.7 only)Long-running agentic/coding tasks over 30 minutes
highHigh capability (default)Complex reasoning, difficult coding, agentic tasks
mediumBalanced approach with moderate token savingsAgentic tasks needing speed/cost balance
lowMost efficient, significant token savingsSimple tasks, subagents, high-volume chat
Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think on sufficiently difficult problems—just less than at higher levels.

How to Use the Effort Parameter

Basic Usage (Python SDK)

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=8192, effort="low", # Options: low, medium, high, max messages=[ {"role": "user", "content": "Explain quantum computing in simple terms."} ] )

print(response.content[0].text)

With Extended Thinking

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},  # Adaptive thinking pairs well with effort
    effort="medium",
    messages=[
        {"role": "user", "content": "Design a distributed caching system."}
    ]
)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 8192, effort: 'high', messages: [ { role: 'user', content: 'Write a Python script to analyze CSV data.' } ] });

console.log(response.content[0].text);

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort, which may introduce unexpected latency. Anthropic recommends explicitly setting effort:

  • Medium (recommended default) – Best balance for most applications: agentic coding, tool-heavy workflows, code generation
  • Low – For high-volume or latency-sensitive workloads: chat, non-coding use cases
  • High – For tasks requiring maximum capability from Sonnet

Practical Scenarios

Scenario 1: Cost-Sensitive Subagents

If you're building a multi-agent system where subagents handle simple classification or extraction tasks, use low effort to minimize token spend:

def classify_document(text):
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        effort="low",
        messages=[
            {"role": "user", "content": f"Classify this document as 'urgent', 'normal', or 'low': {text}"}
        ]
    )
    return response.content[0].text

Scenario 2: Deep Research Tasks

For complex research or multi-step reasoning, use max effort with adaptive thinking:

def deep_research(query):
    response = client.messages.create(
        model="claude-opus-4-20250514",
        max_tokens=64000,
        thinking={"type": "adaptive"},
        effort="max",
        messages=[
            {"role": "user", "content": f"Conduct a thorough analysis of: {query}"}
        ]
    )
    return response.content[0].text

Scenario 3: Balanced Agentic Workflows

For a coding agent that needs both speed and quality, use medium effort:

def code_review_agent(code_snippet):
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        effort="medium",
        tools=[
            {
                "name": "suggest_improvements",
                "description": "Suggest code improvements",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "suggestions": {"type": "array", "items": {"type": "string"}}
                    }
                }
            }
        ],
        messages=[
            {"role": "user", "content": f"Review this code: {code_snippet}"}
        ]
    )
    return response

Effort vs. budget_tokens

If you're migrating from budget_tokens on Opus 4.6 or Sonnet 4.6, here's what changed:

Featurebudget_tokens (deprecated)effort (recommended)
ScopeThinking tokens onlyAll tokens (text, tools, thinking)
PrecisionExact token budgetBehavioral signal
FlexibilityRequires thinking enabledWorks without thinking
Future-proofWill be removedActively supported

Best Practices

  • Start with medium – For most applications, medium offers the best balance of speed, cost, and quality
  • Combine with adaptive thinkingthinking: {type: "adaptive"} pairs naturally with effort levels
  • Test with your workload – Run A/B tests to find the optimal effort level for your specific use case
  • Use low for subagents – Simple classification, extraction, or routing tasks don't need high effort
  • Reserve max for complex tasks – Only use max when you truly need the deepest reasoning

Common Pitfalls

  • Expecting strict budgets – Effort is a signal, not a hard limit. Claude may still spend significant tokens on genuinely hard problems at low effort.
  • Ignoring Sonnet defaults – Sonnet 4.6 defaults to high effort. Always set it explicitly to avoid unexpected latency.
  • Using max unnecessarilymax effort can dramatically increase token usage. Only use it when the task genuinely requires maximum capability.

Key Takeaways

  • Effort controls token spend across text, thinking, and tool calls—not just thinking tokens like the deprecated budget_tokens
  • Five levels available: low, medium, high (default), xhigh (Opus 4.7 only), and max
  • Works without extended thinking enabled, making it universally applicable
  • Sonnet 4.6 users should explicitly set effort to avoid unexpected latency from the high default
  • Combine with adaptive thinking for the best balance of depth and efficiency on complex tasks