GuideBeginnerPricing2026-05-20

Mastering Claude's Effort Parameter: Control Token Spend and Response Depth

Learn how to use Claude's effort parameter to balance response thoroughness and token efficiency across API calls, with practical code examples and recommended settings.

Quick Answer

Claude's effort parameter lets you control how eagerly the model spends tokens on responses. Set it to 'low' for fast, cheap answers on simple tasks, 'medium' for balanced performance, 'high' for complex reasoning, or 'max' for the deepest possible analysis. It works across all response types including text, tool calls, and extended thinking.

effort parametertoken optimizationClaude APIextended thinkingcost control

Introduction

Claude is incredibly powerful, but with great power comes... greater token consumption. If you've ever wished you could dial Claude's thoroughness up or down depending on the task, the effort parameter is exactly what you need. Introduced in the Claude API, this parameter gives you fine-grained control over how many tokens Claude spends on each response—without switching models.

Whether you're building a high-volume chat application that needs lightning-fast replies, or an agentic system that requires deep reasoning over millions of tokens, the effort parameter lets you optimize for speed, cost, or capability—all with a single model.

In this guide, you'll learn:

What the effort parameter is and how it works
The five effort levels and when to use each
How to combine effort with adaptive thinking
Practical code examples in Python and TypeScript
Best practices for different use cases

How the Effort Parameter Works

By default, Claude operates at high effort—spending as many tokens as needed to produce excellent results. The effort parameter lets you adjust this behavior:

Raise effort to max for the absolute highest capability on the hardest problems.
Lower effort to medium or low to be more conservative with token usage, optimizing for speed and cost.

Crucially, effort affects all tokens in the response—not just thinking tokens. This includes:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

This is a major advantage over the older budget_tokens parameter (now deprecated on Opus 4.6 and Sonnet 4.6). Effort gives you a single dial to control overall token spend, including tool call frequency. At lower effort levels, Claude will make fewer tool calls and provide shorter explanations.

Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think deeply on sufficiently difficult problems—but it will think less than it would at higher levels for the same problem.

Effort Levels and Use Cases

Level	Description	Typical Use Case
`max`	Absolute maximum capability, no constraints on token spending	Deepest reasoning, most thorough analysis (Claude Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6)
`xhigh`	Extended capability for long-horizon work	Long-running agentic and coding tasks over 30 minutes with token budgets in the millions (Opus 4.7 only)
`high`	High capability (default behavior)	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced approach with moderate token savings	Agentic tasks needing a balance of speed, cost, and performance
`low`	Most efficient, significant token savings	Simple tasks, high-volume chat, subagents where speed and cost matter most

Recommended Defaults for Sonnet 4.6

Sonnet 4.6 defaults to high effort. For most applications, explicitly set the effort level to avoid unexpected latency:

Medium effort (recommended default): Best balance of speed, cost, and performance for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads like chat and non-coding use cases.

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This allows Claude to dynamically decide how much thinking to apply based on the problem complexity, while the effort parameter sets the overall behavioral context.

# Python example: effort + adaptive thinking
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    effort="medium",  # or "low", "high", "max"
    messages=[
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ]
)
print(response.content)

// TypeScript example: effort + adaptive thinking
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 8192,
  thinking: { type: 'adaptive' },
  effort: 'medium',
  messages: [
    { role: 'user', content: 'Write a Python function to merge two sorted lists.' }
  ]
});
console.log(response.content);

Practical Examples

Example 1: Low Effort for Simple Chat

For a customer support chatbot handling common questions, low effort keeps responses fast and cheap:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    effort="low",
    messages=[
        {"role": "user", "content": "What are your business hours?"}
    ]
)

Example 2: Medium Effort for Agentic Coding

For a coding assistant that needs to balance thoroughness with response time:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    effort="medium",
    tools=[
        {
            "name": "edit_file",
            "description": "Edit a file in the codebase",
            "input_schema": {
                "type": "object",
                "properties": {
                    "file_path": {"type": "string"},
                    "content": {"type": "string"}
                },
                "required": ["file_path", "content"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "Add input validation to the user registration endpoint."}
    ]
)

Example 3: Max Effort for Deep Reasoning

For complex mathematical proofs or multi-step analysis:

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=16384,
    thinking={"type": "adaptive"},
    effort="max",
    messages=[
        {"role": "user", "content": "Prove that the square root of 2 is irrational."}
    ]
)

Best Practices

Start with medium effort for most applications. It provides a strong balance of capability and efficiency.
Use low effort for subagents in multi-agent systems where each subagent handles simple, well-defined tasks.
Reserve max effort for the most challenging problems where you need Claude's absolute best reasoning.
Combine with adaptive thinking to let Claude dynamically allocate thinking tokens based on problem difficulty.
Monitor token usage across effort levels to find the sweet spot for your specific workload. Lower effort doesn't just reduce thinking tokens—it reduces all tokens, including tool calls.

Model Support

The effort parameter is available on:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6
Claude Opus 4.5

No beta header is required. For Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth.

Key Takeaways

The effort parameter controls overall token spend across text, tool calls, and extended thinking—not just thinking tokens.
Five levels are available: low, medium, high (default), xhigh (Opus 4.7 only), and max.
Combine with adaptive thinking (thinking: {type: "adaptive"}) for the best experience on supported models.
Lower effort reduces tool call frequency, making it ideal for high-volume or latency-sensitive applications.
Effort is a behavioral signal, not a strict budget—Claude will still think deeply on hard problems even at lower levels.