BeClaude
GuideBeginnerPricing2026-05-16

Mastering Claude's Effort Parameter: Balance Performance and Cost in Your API Calls

Learn how to use Claude's effort parameter to control token spending, optimize response thoroughness, and reduce costs across all API interactions including tool calls and extended thinking.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn the five effort levels (low, medium, high, xhigh, max), how to implement them in API calls, and practical strategies for balancing performance with cost across different use cases.

effort parametertoken optimizationAPI best practicescost managementClaude Sonnet 4.6

Introduction

When building applications with Claude, one of the most important decisions you'll make is how to balance response quality against token usage and cost. The effort parameter gives you fine-grained control over this trade-off, allowing you to dial Claude's thoroughness up or down with a single API parameter.

Introduced alongside Claude Opus 4.7 and Sonnet 4.6, effort replaces the older budget_tokens parameter as the recommended way to control thinking depth. It works across all response types—text, tool calls, and extended thinking—giving you unprecedented control over your API costs.

In this guide, you'll learn:

  • What the effort parameter does and how it differs from token budgets
  • The five effort levels and when to use each
  • How to implement effort in your API calls with code examples
  • Best practices for Sonnet 4.6 and other supported models

How the Effort Parameter Works

By default, Claude operates at high effort—spending as many tokens as needed to produce excellent results. The effort parameter lets you adjust this behavior:

  • Raise effort to max for the absolute highest capability
  • Lower effort to medium or low for faster, cheaper responses
Unlike a strict token budget, effort is a behavioral signal. At lower levels, Claude will still think deeply on sufficiently difficult problems, but it will think less than it would at higher levels for the same problem. This makes effort more flexible and intelligent than a hard token cap.

Key Advantages

  • No thinking required: Effort works even when extended thinking is disabled
  • Affects all token spend: Including tool calls—lower effort means fewer tool calls
  • Single parameter control: One setting influences text, thinking, and tool usage

Effort Levels Explained

Claude supports five effort levels, each suited to different use cases:

LevelDescriptionTypical Use Case
lowMost efficient. Significant token savings with some capability reduction.Simple tasks, high-volume chat, subagents
mediumBalanced approach with moderate token savings.Agentic tasks needing speed/cost balance
highHigh capability. Equivalent to omitting the parameter.Complex reasoning, coding, agentic tasks
xhighExtended capability for long-horizon work.Long-running agentic/coding tasks (30+ min)
maxAbsolute maximum capability with no constraints.Deepest reasoning, most thorough analysis
Availability note: max is available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6. xhigh is available only on Opus 4.7.

Implementing Effort in Your API Calls

Basic Usage

Add the effort parameter to your Messages API request. Here's an example using Python:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=8192, effort="low", # Options: "low", "medium", "high", "xhigh", "max" messages=[ {"role": "user", "content": "Explain quantum computing in simple terms."} ] )

print(response.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 8192, effort: 'medium', messages: [ { role: 'user', content: 'Write a Python function to sort a list of dictionaries.' } ] });

console.log(response.content[0].text);

Combining with Extended Thinking

For maximum control, combine effort with adaptive thinking:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16384,
    thinking={"type": "adaptive"},
    effort="high",
    messages=[
        {"role": "user", "content": "Solve this complex math problem step by step."}
    ]
)

Using Effort with Tool Calls

Effort affects tool call frequency. Lower effort means Claude will make fewer, more targeted tool calls:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    effort="low",  # Fewer tool calls, faster responses
    tools=[
        {
            "name": "search_database",
            "description": "Search the company database",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "Find all customers who purchased in the last month."}
    ]
)

Recommended Settings for Sonnet 4.6

Sonnet 4.6 defaults to high effort. For optimal results, explicitly set the effort level:

  • Medium effort (recommended default): Best balance for most applications—agentic coding, tool-heavy workflows, code generation
  • Low effort: For high-volume or latency-sensitive workloads—chat, non-coding use cases where speed matters
# Recommended for most applications
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    effort="medium",  # Explicitly set to avoid unexpected latency
    messages=[...]
)

Practical Strategies

1. Tier Your Effort by Task Complexity

Create a simple mapping based on task difficulty:

def get_effort_level(task_type):
    if task_type == "simple_chat":
        return "low"
    elif task_type == "code_generation":
        return "medium"
    elif task_type == "complex_reasoning":
        return "high"
    elif task_type == "deep_research":
        return "max"
    else:
        return "high"  # Default

2. Use Effort for Cost Optimization

For production systems, start with medium and monitor response quality. Only increase to high or max when you observe quality degradation.

3. Combine with Adaptive Thinking

Adaptive thinking (thinking: {type: "adaptive"}) automatically adjusts thinking depth based on the problem. Combined with effort, you get two layers of optimization:

  • Effort controls overall token spend
  • Adaptive thinking fine-tunes thinking depth per request

Important Notes

  • Zero Data Retention: The effort feature is eligible for ZDR. When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.
  • Deprecation: budget_tokens is deprecated on Opus 4.6 and Sonnet 4.6. Use effort instead.
  • Behavioral signal: Effort is not a strict token budget. Claude may still think deeply on hard problems even at low effort.

Key Takeaways

  • Effort replaces budget_tokens as the recommended way to control thinking depth on Opus 4.6 and Sonnet 4.6
  • Five levels available: low, medium, high, xhigh (Opus 4.7 only), and max
  • Affects all token spend: Text, thinking, and tool calls are all influenced by the effort setting
  • Explicitly set effort for Sonnet 4.6 to avoid unexpected latency—medium is recommended as the default
  • Combine with adaptive thinking for the best balance of performance and cost efficiency