Guide2026-05-03

Mastering Claude's Effort Parameter: Control Thinking Depth, Speed, and Cost

Learn how to use the effort parameter in Claude API to balance reasoning depth, token efficiency, and latency. Includes code examples and recommended settings for Opus and Sonnet models.

Quick Answer

The effort parameter lets you control how thoroughly Claude thinks before responding, trading off between capability and token cost. Set it to 'low' for fast, cheap responses on simple tasks, or 'max' for deep reasoning on complex problems. Combine with adaptive thinking for best results.

effort parametertoken efficiencyClaude APIextended thinkingcost optimization

Introduction

Every Claude user knows the dilemma: you want deep, thoughtful responses for complex tasks, but you don't want to burn through tokens (and money) when asking simple questions. The effort parameter solves this by giving you granular control over how "eager" Claude is about spending tokens—all with a single model, no switching required.

This guide explains exactly how effort works, when to use each level, and how to combine it with adaptive thinking for optimal results.

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how thoroughly to reason before responding. Unlike a strict token budget, effort is a soft guide: at lower levels, Claude will still think hard on genuinely difficult problems, but it will think less than it would at higher levels for the same task.

Key advantages:

Works without extended thinking – You can use effort even when thinking is disabled.
Affects all tokens – Including tool calls, function arguments, and text responses. Lower effort means fewer tool calls, giving you broader cost control.

Supported Models

Model	Effort Support	Notes
Claude Mythos Preview	✅ All levels	Max effort available
Claude Opus 4.7	✅ All levels	Includes `xhigh` for long-horizon tasks
Claude Opus 4.6	✅ All levels	Replaces `budget_tokens`
Claude Sonnet 4.6	✅ All levels	Recommended to set explicitly
Claude Opus 4.5	✅ All levels	Basic support

Important: For Opus 4.6 and Sonnet 4.6, effort replaces the deprecated budget_tokens parameter. While budget_tokens still works, it will be removed in a future release.

Effort Levels Explained

Level	Description	Best Use Case
`max`	Absolute maximum capability, no token constraints	Deepest reasoning, research-grade analysis
`xhigh`	Extended capability for long-horizon work (Opus 4.7 only)	Agentic/coding tasks over 30 minutes with million+ token budgets
`high`	Default behavior, excellent results	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced approach with moderate token savings	Agentic tasks needing speed/cost balance
`low`	Most efficient, significant token savings	Simple tasks, subagents, high-volume chat

Default behavior: Omitting the effort parameter is identical to setting effort: "high".

Recommended Settings for Sonnet 4.6

Sonnet 4.6 defaults to high effort, which may be overkill for many applications. Anthropic recommends explicitly setting effort to avoid unexpected latency:

Medium effort (recommended default): Best balance for most apps—agentic coding, tool-heavy workflows, code generation.
Low effort: High-volume or latency-sensitive workloads—chat, non-coding tasks where speed matters.
High effort: Only for tasks requiring maximum reasoning depth.

How to Use the Effort Parameter

Basic API Call (Python)

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    effort="medium",  # Control token spending
    messages=[
        {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ]
)
print(response.content[0].text)

With Extended Thinking (TypeScript)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-opus-4-20250514',
  max_tokens: 2048,
  thinking: {
    type: 'adaptive',  // Best paired with effort
    budget_tokens: 4096
  },
  effort: 'high',
  messages: [
    { role: 'user', content: 'Design a distributed caching system.' }
  ]
});
console.log(response.content);

Effort with Tool Use

Lower effort reduces the number of tool calls Claude makes, saving tokens on multi-step tasks:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    effort="low",  # Fewer tool calls, faster responses
    tools=[
        {
            "name": "search_web",
            "description": "Search the web for information",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "Find the latest news about AI regulation."}
    ]
)

Combining Effort with Adaptive Thinking

For the best experience, pair effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking dynamically adjusts the thinking budget based on task complexity, while effort sets the overall behavioral tone.

Example configuration:

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "adaptive",
        "budget_tokens": 8192  # Maximum budget; adaptive will use less
    },
    effort="medium",  # Balanced token spending
    messages=[
        {"role": "user", "content": "Analyze this financial dataset and identify trends."}
    ]
)

Practical Scenarios

Scenario 1: Customer Support Chatbot

Effort: low
Why: Most queries are simple (order status, FAQs). Low effort gives fast, cheap responses. For complex issues, Claude will still think harder.

Scenario 2: Code Generation Agent

Effort: medium (Sonnet 4.6) or high (Opus 4.6)
Why: Code generation benefits from moderate reasoning. Medium effort on Sonnet balances speed and quality.

Scenario 3: Research Assistant

Effort: max
Why: Deep analysis, multi-step reasoning, and thoroughness are critical. Token cost is secondary.

Scenario 4: Long-Running Agent (30+ minutes)

Effort: xhigh (Opus 4.7 only)
Why: Extended tasks with million+ token budgets need the highest capability without premature token exhaustion.

Best Practices

Start with medium for Sonnet 4.6 – It's the new recommended default for most applications.
Use low for subagents – When Claude is part of a larger pipeline handling simple tasks, low effort saves tokens.
Pair with adaptive thinking – For maximum efficiency, combine effort with thinking: {type: "adaptive"}.
Monitor token usage – Lower effort doesn't guarantee a fixed token count; it's a behavioral signal. Test with your specific workloads.
Avoid budget_tokens on newer models – Use effort instead; it's more flexible and future-proof.

Key Takeaways

The effort parameter controls token spending behavior across text, tool calls, and thinking—no need to switch models for different task complexities.
Five levels from low to max let you trade off between speed/cost and capability. Default is high.
Sonnet 4.6 users should explicitly set effort to avoid unexpected latency; medium is the recommended default.
Combine with adaptive thinking for the best balance of depth and efficiency.
effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6—migrate your code to stay compatible with future model releases.