GuideBeginnerPricing2026-05-22

Mastering Claude's Effort Parameter: Control Thinking Depth for Cost and Speed

Learn how to use Claude's effort parameter to control token spending, balance capability with efficiency, and optimize API costs across all supported models.

Quick Answer

This guide explains how to use Claude's effort parameter to control how eagerly the model spends tokens, enabling you to trade off between response thoroughness and efficiency across all supported models.

effort parametertoken optimizationextended thinkingAPI best practicescost control

Mastering Claude's Effort Parameter: Control Thinking Depth for Cost and Speed

Claude's effort parameter is a powerful new tool that gives you fine-grained control over how many tokens the model spends when responding to requests. Whether you're building a high-volume chatbot that needs lightning-fast replies or a deep reasoning agent tackling complex problems, the effort parameter lets you dial in the perfect balance of capability, speed, and cost—all with a single model.

In this guide, you'll learn exactly how effort works, when to use each level, and how to implement it in your API calls with practical code examples.

What Is the Effort Parameter?

The effort parameter controls how "eager" Claude is about spending tokens when generating responses. It affects all tokens in the output, including:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

This is a major advantage over older approaches like budget_tokens, which only affected thinking tokens. With effort, you get comprehensive control over the entire response.

Important: Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think deeply on sufficiently difficult problems—it just won't think as much as it would at higher levels.

Supported Models

The effort parameter is available on all supported models without any beta header. Currently supported models include:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6
Claude Opus 4.5

For Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future release.

Effort Levels Explained

Claude offers five effort levels, each suited to different use cases:

Level	Description	Best For
`max`	Absolute maximum capability with no constraints on token spending	Deepest possible reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6)
`xhigh`	Extended capability for long-horizon work	Long-running agentic and coding tasks (over 30 min) with token budgets in the millions (Opus 4.7 only)
`high`	High capability. Equivalent to not setting the parameter.	Complex reasoning, difficult coding problems, agentic tasks
`medium`	Balanced approach with moderate token savings	Agentic tasks needing a balance of speed, cost, and performance
`low`	Most efficient. Significant token savings with some capability reduction.	Simple tasks needing best speed and lowest costs, such as subagents

Default behavior: If you omit the effort parameter entirely, Claude defaults to high effort.

Recommended Settings for Sonnet 4.6

Sonnet 4.6 defaults to high effort, which can introduce unexpected latency. For most applications, explicitly set the effort level:

Medium effort (recommended default): Best balance of speed, cost, and performance. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed matters most.

How to Use the Effort Parameter in the API

Basic Usage

Here's how to set the effort parameter in a standard API call:

Python (using the Anthropic SDK):

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    effort="medium",  # Set effort level here
    messages=[
        {"role": "user", "content": "Explain the theory of relativity in simple terms."}
    ]
)
print(response.content[0].text)

TypeScript (using the Anthropic SDK):

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 8192,
  effort: 'medium',
  messages: [
    { role: 'user', content: 'Explain the theory of relativity in simple terms.' }
  ]
});
console.log(response.content[0].text);

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking. This allows Claude to dynamically decide when to use extended thinking based on the problem complexity:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    effort="medium",
    messages=[
        {"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) dx"}
    ]
)

With adaptive thinking enabled, Claude will automatically engage extended thinking for difficult problems while skipping it for simpler ones—saving tokens without sacrificing quality.

Using Effort with Tool Calls

Effort also controls how many tool calls Claude makes. Lower effort means fewer, more targeted tool calls:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    effort="low",
    tools=[
        {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "What's the weather like in Tokyo and New York?"}
    ]
)

With low effort, Claude might make a single tool call for both cities, while high or max effort would likely make separate calls for each.

Practical Use Cases

1. High-Volume Customer Support Chat

For a chatbot handling thousands of simple queries per minute, use low effort:

def handle_support_query(user_message):
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        effort="low",
        system="You are a helpful customer support agent. Answer concisely.",
        messages=[{"role": "user", "content": user_message}]
    )
    return response.content[0].text

2. Complex Code Generation Agent

For an agent that needs to reason deeply about architecture and edge cases, use max effort:

def generate_complex_feature(request):
    response = client.messages.create(
        model="claude-opus-4-20250514",
        max_tokens=16000,
        effort="max",
        thinking={"type": "adaptive"},
        system="You are an expert software architect. Generate production-ready code with full error handling.",
        messages=[{"role": "user", "content": request}]
    )
    return response.content[0].text

3. Balanced Subagent Orchestration

When building a multi-agent system, use medium effort for subagents to save costs while maintaining quality:

class SubAgent:
    def __init__(self, role):
        self.role = role
        
    def process(self, task):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            effort="medium",
            system=f"You are a {self.role} subagent. Complete tasks efficiently.",
            messages=[{"role": "user", "content": task}]
        )
        return response.content[0].text

Best Practices

Start with medium for Sonnet 4.6: This gives you the best balance of speed, cost, and performance for most applications.

Use adaptive thinking alongside effort: The combination gives Claude the flexibility to think deeply when needed while respecting your effort preference.

Test different levels on your specific tasks: The optimal effort level depends on your use case. Run A/B tests to find the sweet spot.

Monitor token usage: Track your token consumption across effort levels to understand the cost implications.

Don't over-optimize for simple tasks: If your task is straightforward (e.g., translation, summarization), low effort is often sufficient.

Limitations and Considerations

Effort is not a hard budget: At lower levels, Claude may still spend significant tokens on genuinely hard problems.
xhigh is only available on Opus 4.7: For long-running agentic tasks, use Opus 4.7 with xhigh effort.
Zero Data Retention (ZDR): The effort parameter is eligible for ZDR. When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.

Key Takeaways

The effort parameter controls all token spend—text, tool calls, and thinking—giving you comprehensive control over cost and speed.
Five effort levels (low, medium, high, xhigh, max) let you dial in the perfect balance for any task, from simple chatbots to deep reasoning agents.
Combine effort with adaptive thinking (thinking: {"type": "adaptive"}) for the best experience, allowing Claude to dynamically decide when to engage extended thinking.
For Sonnet 4.6, explicitly set effort to avoid unexpected latency; medium is the recommended default for most applications.
Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6—migrate your code to use the new parameter before budget_tokens is removed.