BeClaude
GuideBeginnerPricing2026-05-22

Mastering the Effort Parameter in Claude API: Balance Cost, Speed, and Intelligence

Learn how to use Claude's effort parameter to control token spending, response thoroughness, and latency. Includes code examples, effort levels, and best practices for Sonnet 4.6 and Opus 4.7.

Quick Answer

The effort parameter lets you control how eagerly Claude spends tokens on a response. Set it from 'low' (fast, cheap, simpler tasks) to 'max' (deepest reasoning). It works across all response tokens—including tool calls and thinking—without requiring extended thinking mode.

effort parametertoken optimizationClaude APIcost controlextended thinking

Introduction

Every Claude API call is a trade-off between intelligence, speed, and cost. Sometimes you need Claude to reason deeply about a complex codebase; other times you just need a quick classification. Historically, developers had to juggle separate models, thinking budgets, and complex prompt engineering to achieve this balance.

Enter the effort parameter—a single, intuitive control that lets you dial Claude's "eagerness to spend tokens" up or down. Available on Claude Opus 4.5, Opus 4.6, Opus 4.7, Sonnet 4.6, and the new Mythos Preview, effort replaces the older budget_tokens approach and works seamlessly with or without extended thinking.

In this guide, you'll learn:

  • What the effort parameter does and how it differs from token budgets
  • The six effort levels and when to use each
  • Practical code examples in Python and TypeScript
  • Best practices for Sonnet 4.6 and Opus 4.7
  • How to combine effort with adaptive thinking for maximum efficiency

How the Effort Parameter Works

By default, Claude operates at high effort—spending as many tokens as needed for excellent results. The effort parameter lets you move up or down from this baseline:

  • Raise effort → deeper reasoning, more tool calls, longer responses, higher cost
  • Lower effort → faster responses, fewer tokens, lower cost, some capability reduction
Crucially, effort affects all tokens in the response: text explanations, tool call arguments, and extended thinking tokens (when enabled). This gives you far more control than older approaches that only limited thinking tokens.
Important: Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think hard on sufficiently difficult problems—it just won't think as much as it would at higher effort for the same problem.

Effort Levels and When to Use Them

LevelDescriptionBest For
maxAbsolute maximum capability, no token constraintsDeepest reasoning, research, complex math (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6)
xhighExtended capability for long-horizon workLong-running agentic/coding tasks (>30 min, millions of tokens) — Opus 4.7 only
highDefault behavior, excellent resultsComplex reasoning, difficult coding, agentic tasks
mediumBalanced approach with moderate savingsAgentic tasks needing speed/cost/performance balance
lowMost efficient, significant token savingsSimple tasks, subagents, high-volume chat, latency-sensitive workloads
Note: Setting effort to "high" produces exactly the same behavior as omitting the parameter entirely.

Code Examples

Python (using the Anthropic SDK)

import anthropic

client = anthropic.Anthropic()

Low effort: fast, cheap classification

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, system="Classify the sentiment as positive, negative, or neutral.", messages=[ {"role": "user", "content": "The product arrived broken and customer service was unhelpful."} ], thinking={"type": "enabled", "budget_tokens": 1024}, effort="low" # Fast, minimal thinking )

print(response.content[0].text)

# Max effort: deep reasoning for complex code review
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=8192,
    system="You are a senior code reviewer. Find all bugs, security issues, and performance problems.",
    messages=[
        {"role": "user", "content": "Review this Python code..."}
    ],
    thinking={"type": "enabled", "budget_tokens": 16000},
    effort="max"
)

print(response.content[0].text)

TypeScript (using the Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

// Medium effort: balanced for agentic coding const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 4096, system: 'You are a helpful coding assistant. Generate clean, well-documented code.', messages: [ { role: 'user', content: 'Write a React component that fetches and displays user data.' } ], thinking: { type: 'enabled', budget_tokens: 4096 }, effort: 'medium' });

console.log(response.content[0].text);

Combining Effort with Adaptive Thinking

For the best experience on Opus 4.6 and Sonnet 4.6, Anthropic recommends combining effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking lets Claude dynamically decide how much thinking to do based on the problem complexity, while effort sets the overall eagerness level.

# Adaptive thinking + medium effort = optimal balance
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Explain quantum entanglement in simple terms."}],
    thinking={"type": "adaptive"},
    effort="medium"
)

When using adaptive thinking, Claude may skip thinking entirely for simple problems at lower effort levels—saving you tokens and latency.

Best Practices for Sonnet 4.6

Sonnet 4.6 defaults to high effort, which can introduce unexpected latency. Anthropic recommends explicitly setting effort when using Sonnet 4.6:

  • Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
  • Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed is critical.
# Recommended: explicitly set effort to avoid surprises
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Write a bash script to backup a PostgreSQL database."}],
    effort="medium"  # Explicitly set, not relying on default
)

Effort and Tool Calls

One of the biggest advantages of the effort parameter is that it affects tool call behavior. At lower effort levels, Claude will make fewer tool calls and choose simpler tool combinations. This can dramatically reduce both latency and cost in agentic workflows.

# Low effort: Claude will be conservative with tool usage
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[
        {
            "name": "search_database",
            "description": "Search the product database",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            }
        },
        {
            "name": "get_product_details",
            "description": "Get full product details",
            "input_schema": {
                "type": "object",
                "properties": {
                    "product_id": {"type": "string"}
                },
                "required": ["product_id"]
            }
        }
    ],
    messages=[{"role": "user", "content": "Find me the best laptop under $1000."}],
    effort="low"
)

Migration from budget_tokens

If you're currently using budget_tokens on Opus 4.6 or Sonnet 4.6, Anthropic recommends migrating to the effort parameter. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Before (deprecated):
thinking={"type": "enabled", "budget_tokens": 2048}
After (recommended):
thinking={"type": "adaptive"},
effort="medium"

Key Takeaways

  • Effort is a single, unified control that affects all response tokens—text, tool calls, and thinking—giving you fine-grained control over the cost-speed-intelligence trade-off.
  • Six levels from low (fastest, cheapest) to max (deepest reasoning) let you match Claude's behavior to your task complexity.
  • Combine with adaptive thinking (thinking: {type: "adaptive"}) for optimal efficiency on Opus 4.6 and Sonnet 4.6.
  • Always explicitly set effort on Sonnet 4.6 to avoid unexpected latency from the default high setting.
  • Migrate from budget_tokens to effort + adaptive thinking for future-proof code that works across all supported models.