BeClaude
GuideBeginnerPricing2026-05-22

Mastering Claude's Effort Parameter: Balance Cost, Speed, and Reasoning Depth

Learn how to control Claude's token spending with the effort parameter. Optimize for speed, cost, or deep reasoning across all models, including Sonnet 4.6 and Opus 4.7.

Quick Answer

The effort parameter lets you control how eagerly Claude spends tokens—from low (fast, cheap, simpler tasks) to max (deepest reasoning). It works across all response types, including tool calls and thinking, without requiring extended thinking to be enabled.

effort parametertoken optimizationClaude APIcost managementextended thinking

Introduction

When building with Claude, you often face a trade-off: do you want the deepest possible reasoning, or do you need fast, cost-effective responses? Historically, you had to choose between models or manually set budget_tokens for thinking depth. The effort parameter changes that.

Effort is a single, intuitive knob that controls how eagerly Claude spends tokens on any request. It works across all supported models, affects text, tool calls, and extended thinking, and doesn't require thinking to be enabled. This guide will show you exactly how to use it to optimize your application.

How Effort Works

By default, Claude uses high effort—spending as many tokens as needed for excellent results. You can raise it to max for absolute top capability, or lower it to medium or low to save tokens and reduce latency.

Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think deeply on difficult problems, but it will think less than it would at higher levels for the same problem.

Effort Levels Overview

LevelDescriptionTypical Use Case
maxAbsolute maximum capability, no token constraintsDeepest reasoning, complex analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6)
xhighExtended capability for long-horizon workLong-running agentic/coding tasks over 30 minutes (Opus 4.7 only)
highDefault behavior, excellent resultsComplex reasoning, difficult coding, agentic tasks
mediumBalanced approach, moderate token savingsAgentic tasks needing speed/cost balance
lowMost efficient, significant token savingsSimple tasks, subagents, high-volume chat

Setting Effort in the API

Effort is set via the effort parameter in the request body. It's available on all supported models with no beta header required.

Python Example

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=8192, effort="low", # Options: "low", "medium", "high", "xhigh", "max" messages=[ {"role": "user", "content": "Write a Python function to merge two sorted lists."} ] )

print(response.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 8192, effort: 'medium', messages: [ { role: 'user', content: 'Write a Python function to merge two sorted lists.' } ] });

console.log(response.content[0].text);

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency and token consumption, explicitly set effort when using this model:

  • Medium (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
  • Low: For high-volume or latency-sensitive workloads. Suitable for chat and simple classification tasks.

Combining Effort with Extended Thinking

Effort works seamlessly with extended thinking. For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}).

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16384,
    thinking={"type": "adaptive"},
    effort="max",
    messages=[
        {"role": "user", "content": "Prove the Riemann Hypothesis... just kidding. Solve this complex optimization problem."}
    ]
)
Note: For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Practical Use Cases

1. High-Volume Chat (Low Effort)

Use effort="low" for customer support bots, FAQ systems, or any application where speed and cost matter more than deep reasoning.

2. Agentic Coding Assistants (Medium Effort)

Set effort="medium" for tools that generate code, refactor functions, or write tests. You get solid reasoning without excessive token burn.

3. Complex Analysis (High or Max Effort)

For legal document review, scientific research, or multi-step agentic workflows, use effort="high" or effort="max" to get Claude's best reasoning.

4. Long-Running Agents (Xhigh Effort, Opus 4.7 Only)

If you're building an agent that runs for over 30 minutes with token budgets in the millions, effort="xhigh" gives you extended capability for long-horizon work.

How Effort Affects Tool Calls

Effort doesn't just affect text responses—it also influences tool calls. At lower effort levels, Claude will make fewer tool calls and use simpler arguments. This gives you a much greater degree of control over efficiency than budget_tokens ever did.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    effort="low",  # Fewer, simpler tool calls
    tools=[
        {
            "name": "search_database",
            "description": "Search the product database",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "Find me the cheapest laptop with 16GB RAM."}
    ]
)

Best Practices

  • Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency.
  • Start with medium for most applications, then adjust based on observed performance and cost.
  • Use adaptive thinking (thinking: {type: "adaptive"}) alongside effort for the best balance of depth and efficiency.
  • Test with your actual workload—effort is a behavioral signal, so results vary by task complexity.
  • Monitor token usage across effort levels to find the sweet spot for your use case.

Key Takeaways

  • Effort is a single parameter that controls token spending across text, tool calls, and thinking—no beta header required.
  • Lower effort saves tokens and reduces latency but may reduce capability on complex tasks; higher effort delivers deeper reasoning at higher cost.
  • For Sonnet 4.6, always set effort explicitly—the default is high, which may be overkill for simple tasks.
  • Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6; combine it with adaptive thinking for best results.
  • Use low for high-volume chat, medium for agentic coding, and high/max for complex analysis—xhigh is reserved for long-running Opus 4.7 agents.