GuideBeginnerPricing2026-05-13

Mastering Claude's Effort Parameter: Control Thinking Depth, Cost, and Speed

Learn how to use Claude's effort parameter to balance response thoroughness, token efficiency, and latency. Includes code examples, recommended levels, and best practices.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn how to set effort levels (low, medium, high, xhigh, max) to trade off between response quality and speed/cost, with practical API examples and recommended defaults for different use cases.

effort parametertoken optimizationClaude APIextended thinkingcost control

Introduction

Claude is incredibly powerful, but with great power comes great token consumption. Every deep reasoning step, every tool call, and every carefully crafted explanation costs tokens—and therefore time and money. But what if you could dial Claude's "thinking effort" up or down depending on the task?

That's exactly what the effort parameter does. Introduced across Claude's latest models (including Claude Opus 4.7, Opus 4.6, Sonnet 4.6, and the Mythos Preview), effort gives you fine-grained control over how many tokens Claude spends on a response. It's a single knob that affects everything from reasoning depth to tool call frequency.

In this guide, you'll learn:

How the effort parameter works under the hood
When to use each effort level (low, medium, high, xhigh, max)
How to set effort in the API with Python and TypeScript
Best practices for combining effort with adaptive thinking
Real-world trade-offs between speed, cost, and capability

How the Effort Parameter Works

By default, Claude uses high effort—spending as many tokens as needed for excellent results. Setting effort to "high" is identical to omitting the parameter entirely.

The effort parameter is a behavioral signal, not a strict token budget. At lower levels, Claude will still think deeply on sufficiently difficult problems, but it will think less than it would at higher levels for the same problem.

Crucially, effort affects all tokens in the response:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

This gives you a much greater degree of control over efficiency compared to older approaches like budget_tokens, which only constrained thinking tokens.

Effort Levels Explained

Level	Description	Typical Use Case
`max`	Absolute maximum capability, no constraints on token spending	Deepest reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6)
`xhigh`	Extended capability for long-horizon work	Long-running agentic/coding tasks over 30 minutes with million-token budgets (Opus 4.7 only)
`high`	High capability (default)	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced approach with moderate token savings	Agentic tasks needing a balance of speed, cost, and performance
`low`	Most efficient, significant token savings	Simpler tasks, subagents, high-volume chat

Important: max is available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6. xhigh is currently exclusive to Opus 4.7.

Recommended Defaults for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set the effort level:

Medium (recommended default): Best balance for most applications—agentic coding, tool-heavy workflows, code generation.
Low: For high-volume or latency-sensitive workloads—chat, non-coding use cases where faster turnaround matters.

Setting Effort in the API

Python (using the Anthropic SDK)

import anthropic
client = anthropic.Anthropic()
Low effort for fast, cheap responses
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    effort="low"  # or "medium", "high", "max"
)
print(response.content[0].text)

TypeScript (using the Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  system: 'You are a helpful assistant.',
  messages: [
    { role: 'user', content: 'What is the capital of France?' }
  ],
  effort: 'low' // or 'medium', 'high', 'max'
});
console.log(response.content[0].text);

Combining with Adaptive Thinking

For the best experience, combine effort with adaptive thinking. This allows Claude to dynamically decide whether to think on each request, saving tokens on simple queries while still reasoning deeply on complex ones.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={"type": "adaptive"},
    messages=[
        {"role": "user", "content": "Solve this equation step by step: 3x + 7 = 22"}
    ],
    effort="medium"
)

Practical Use Cases by Effort Level

Low Effort: High-Volume Chat & Subagents

When you're running a swarm of subagents or handling thousands of simple queries per minute, every token counts. Low effort can reduce token spend by 30-50% on straightforward tasks.

Example: A customer support bot answering FAQs.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    messages=[{"role": "user", "content": "What are your return policy hours?"}],
    effort="low"
)

Medium Effort: Agentic Coding & Tool-Heavy Workflows

Medium is the sweet spot for most production applications. Claude will still reason deeply when needed but won't over-analyze simple steps.

Example: A code generation agent that calls tools to read files, write code, and run tests.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    tools=[
        {
            "name": "read_file",
            "description": "Read the contents of a file",
            "input_schema": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"}
                },
                "required": ["path"]
            }
        }
    ],
    messages=[{"role": "user", "content": "Refactor the main function in app.py to use async/await"}],
    effort="medium"
)

High Effort: Complex Reasoning & Difficult Problems

Use high (or omit the parameter) when you need Claude's full reasoning capability—complex math, multi-step planning, nuanced analysis.

Example: Analyzing a legal contract for potential issues.

Max Effort: The Absolute Best Claude Can Do

Reserve max for your hardest problems where token cost is secondary to getting the right answer. This is ideal for research, advanced mathematics, or debugging elusive bugs.

Migration from budget_tokens

If you've been using budget_tokens on Opus 4.6 or Sonnet 4.6, it's time to switch. The budget_tokens parameter is deprecated and will be removed in a future model release. Replace it with effort:

# Old way (deprecated)
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    thinking={"type": "enabled", "budget_tokens": 2048},
    messages=[...]
)
New way (recommended)
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    thinking={"type": "adaptive"},
    messages=[...],
    effort="medium"
)

Best Practices

Start with medium for most applications—it's the best balance of speed, cost, and capability.
Use low for subagents and high-volume, simple tasks to maximize throughput.
Combine with adaptive thinking (thinking: {type: "adaptive"}) for optimal token efficiency.
Explicitly set effort on Sonnet 4.6 to avoid unexpected latency from the default high.
Monitor token usage in production—the effort parameter is a behavioral signal, so actual token spend may vary.
Test with your specific workload—the optimal effort level depends on task complexity and your latency/cost requirements.

Limitations & Considerations

Effort is a behavioral signal, not a strict budget. At lower levels, Claude may still think deeply on hard problems.
The xhigh level is currently only available on Claude Opus 4.7.
Lower effort may reduce tool call frequency and response quality on complex tasks.
This feature is eligible for Zero Data Retention (ZDR)—data is not stored after the API response is returned.

Key Takeaways

The effort parameter controls token spending across all response types—text, tool calls, and thinking—giving you a single knob for cost/speed optimization.
Use medium as your default for most applications, low for high-volume simple tasks, and high or max for complex reasoning.
Combine effort with adaptive thinking (thinking: {type: "adaptive"}) for the best balance of depth and efficiency.
Migrate from budget_tokens to effort on Opus 4.6 and Sonnet 4.6—the old parameter is deprecated.
Always explicitly set effort on Sonnet 4.6 to avoid unexpected latency from the default high setting.