GuideBeginnerPricing2026-05-22

Mastering Claude's Effort Parameter: Optimize Token Spend and Response Depth

Learn how to use Claude's effort parameter to control token spending, balance speed and capability, and optimize costs across API calls and agentic workflows.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn how to set effort levels from low to max to trade off between speed, cost, and response thoroughness, with practical code examples for the API.

effort parametertoken optimizationClaude APIcost managementextended thinking

Mastering Claude's Effort Parameter: Optimize Token Spend and Response Depth

When building with Claude, one of the most powerful yet underutilized controls is the effort parameter. This feature lets you dial in exactly how much "thinking" Claude does before responding, giving you fine-grained control over token consumption, latency, and output quality—all with a single model.

In this guide, you'll learn what the effort parameter is, how it works across different Claude models, and how to use it effectively in your API calls to balance performance and cost.

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how eagerly it should spend tokens when responding to requests. By default, Claude uses high effort, spending as many tokens as needed for excellent results. You can raise it to max for the absolute highest capability, or lower it to medium or low for faster, cheaper responses.

Key characteristics:

Available on all supported models without any beta header
Works with or without extended thinking enabled
Affects all tokens in the response: text, tool calls, and thinking tokens
Replaces the deprecated budget_tokens parameter on Opus 4.6 and Sonnet 4.6

Effort Levels Explained

Level	Description	Best For
`max`	Absolute maximum capability, no constraints on token spending	Deep reasoning, complex analysis, research-grade tasks
`xhigh`	Extended capability for long-horizon work (Opus 4.7 only)	Long-running agentic and coding tasks (>30 min)
`high`	Default behavior, excellent results	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced approach with moderate token savings	Agentic tasks needing speed/cost/performance balance
`low`	Most efficient, significant token savings	Simple tasks, high-volume chat, subagents

Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think on sufficiently difficult problems—it just thinks less than it would at higher levels for the same problem.

How Effort Works Under the Hood

When you set the effort parameter, Claude adjusts its internal reasoning process. At high and max effort, Claude almost always thinks before responding. At lower levels, it may skip thinking for simpler problems, jumping straight to an answer.

This affects:

Text responses: Shorter, more direct answers at low effort; longer, more thorough explanations at high effort
Tool calls: Fewer tool calls at low effort; more thorough tool usage at high effort
Extended thinking: Deeper reasoning chains at higher effort levels

Using Effort with the API

Basic Usage (Python)

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    # Set effort level
    extra_headers={
        "anthropic-effort": "low"
    }
)
print(response.content[0].text)

Using Effort with Extended Thinking

For maximum capability, combine effort with adaptive thinking:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    messages=[
        {"role": "user", "content": "Design a distributed caching system for a global e-commerce platform."}
    ],
    extra_headers={
        "anthropic-effort": "max"
    }
)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
async function getResponse() {
  const response = await client.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 4096,
    messages: [
      { role: 'user', content: 'Summarize this 50-page document.' }
    ],
    extra_headers: {
      'anthropic-effort': 'medium'
    }
  });
  
  console.log(response.content[0].text);
}

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. For most applications, explicitly set effort to avoid unexpected latency:

Medium (recommended default): Best balance for agentic coding, tool-heavy workflows, and code generation
Low: For high-volume or latency-sensitive workloads like chat and simple Q&A

Practical Scenarios

Scenario 1: High-Volume Customer Support Chat

Use low effort for simple, repetitive queries where speed matters more than depth:

def handle_support_query(user_message):
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": user_message}],
        extra_headers={"anthropic-effort": "low"}
    )
    return response.content[0].text

Scenario 2: Complex Code Review

Use max effort for thorough analysis:

def review_code(code_snippet):
    response = client.messages.create(
        model="claude-opus-4-20250514",
        max_tokens=8192,
        thinking={"type": "adaptive"},
        messages=[{"role": "user", "content": f"Review this code:\n\n{code_snippet}"}],
        extra_headers={"anthropic-effort": "max"}
    )
    return response.content[0].text

Scenario 3: Multi-Agent System

Use different effort levels for different agents in a multi-agent setup:

# Orchestrator agent: high effort for planning
orchestrator_effort = "high"
Research subagent: medium effort for balanced performance
research_effort = "medium"
Simple data extraction subagent: low effort for speed
extraction_effort = "low"

Effort vs. Budget Tokens

If you're migrating from budget_tokens on Opus 4.6 or Sonnet 4.6, here's what changed:

Feature	budget_tokens (deprecated)	effort (recommended)
Control type	Hard token limit	Behavioral signal
Flexibility	Fixed budget per request	Adaptive to problem difficulty
Future support	Will be removed	Long-term supported
Works without thinking	No	Yes

Best Practices

Start with medium effort for most applications, then adjust based on observed performance and cost.
Use adaptive thinking alongside effort for the best experience on complex tasks.
Profile your workload: Run the same prompt at different effort levels to measure latency and quality differences.
Combine with max_tokens: Set a reasonable max_tokens limit as a safety net even at max effort.
Monitor token usage: Track input and output tokens to calculate cost savings when lowering effort.

Limitations and Considerations

Effort is not supported on all legacy models—check the model documentation for compatibility.
At low effort, Claude may skip thinking for simple problems, which can reduce quality on borderline-complex tasks.
The xhigh level is currently only available on Claude Opus 4.7.
Effort is a behavioral signal, so actual token savings may vary depending on problem difficulty.

Key Takeaways

The effort parameter lets you control token spending by adjusting how eagerly Claude thinks before responding, from low (fast/cheap) to max (deep/thorough).
Effort works with all tokens—text, tool calls, and extended thinking—giving you broad control over response behavior.
Medium effort is the recommended default for most applications, especially with Sonnet 4.6, balancing speed, cost, and capability.
Combine effort with adaptive thinking for optimal results on complex tasks requiring deep reasoning.
Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6, offering more flexible, behavior-driven control without hard token limits.