GuideBeginner2026-05-06

Mastering Claude's Effort Parameter: Balance Token Efficiency and Reasoning Depth

Learn how to use Claude's effort parameter to control token spending, optimize costs, and balance reasoning depth across different use cases—from simple subagents to complex coding tasks.

Quick Answer

The effort parameter lets you control how eagerly Claude spends tokens on reasoning and responses. Use 'low' for fast, cheap subagents; 'medium' for balanced agentic coding; 'high' (default) for complex tasks; and 'max' for the deepest possible reasoning on Claude Opus 4.7 and Mythos Preview.

effort parametertoken optimizationClaude APIcost controlextended thinking

Introduction

When building with Claude, one of the most powerful levers you can pull is the effort parameter. Introduced as a replacement for the older budget_tokens approach, effort gives you fine-grained control over how many tokens Claude spends on reasoning, tool calls, and text generation—all with a single model.

Whether you're building a fast, low-cost subagent or a deep-thinking research assistant, the effort parameter lets you trade off between response thoroughness and token efficiency without switching models or writing complex logic.

In this guide, you'll learn:

How the effort parameter works under the hood
The five effort levels and when to use each
How to combine effort with adaptive thinking for optimal results
Practical code examples for Python and TypeScript
Best practices for different use cases

How the Effort Parameter Works

By default, Claude operates at high effort—spending as many tokens as needed for excellent results. The effort parameter changes this behavior by acting as a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think on sufficiently difficult problems, but it will think less than it would at higher levels for the same problem.

The effort parameter affects all tokens in the response, including:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

This gives you two major advantages over the old budget_tokens approach:

No need to enable thinking to use effort—it works with any request
Broader control—effort influences tool call frequency, not just thinking depth

For example, at lower effort levels, Claude will make fewer tool calls, saving tokens on both the call and the response.

Effort Levels and Use Cases

Claude supports five effort levels, each designed for specific scenarios:

Level	Description	Typical Use Case
`low`	Most efficient. Significant token savings with some capability reduction.	Simple tasks, high-volume subagents, latency-sensitive chat
`medium`	Balanced approach with moderate token savings.	Agentic tasks needing speed/cost balance, tool-heavy workflows
`high`	High capability. Equivalent to not setting the parameter.	Complex reasoning, difficult coding, agentic tasks
`xhigh`	Extended capability for long-horizon work. Available on Claude Opus 4.7.	Long-running agentic/coding tasks (>30 min) with million-token budgets
`max`	Absolute maximum capability with no constraints. Available on Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6.	Deepest possible reasoning, most thorough analysis

Note: xhigh is currently only available on Claude Opus 4.7. max is available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set the effort parameter:

Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.
High effort: For tasks requiring deep reasoning or complex problem-solving.

Combining Effort with Adaptive Thinking

For the best experience, combine the effort parameter with adaptive thinking (thinking: {type: "adaptive"}). This pairing lets Claude dynamically decide when to engage extended thinking based on the effort level you set.

At high (default) and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems, saving tokens without sacrificing quality on genuinely hard questions.

Practical Code Examples

Python Example

import anthropic
client = anthropic.Anthropic()
Low effort: fast, cheap subagent
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Summarize this email in one sentence."}
    ],
    extra_headers={
        "anthropic-effort": "low"
    }
)
print(response.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Medium effort: balanced for agentic coding
const response = await client.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 4096,
    system: 'You are a senior software engineer.',
    messages: [
        { role: 'user', content: 'Write a Python function to merge two sorted lists.' }
    ],
    extra_headers: {
        'anthropic-effort': 'medium'
    }
});
console.log(response.content[0].text);

Combining with Adaptive Thinking

import anthropic
client = anthropic.Anthropic()
Max effort with adaptive thinking for deep reasoning
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    messages=[
        {"role": "user", "content": "Prove the Riemann Hypothesis... just kidding. Explain the P vs NP problem in simple terms."}
    ],
    extra_headers={
        "anthropic-effort": "max"
    }
)
print(response.content[0].text)

Best Practices

1. Start with Medium, Adjust Based on Results

For most applications, medium effort provides an excellent balance. If you notice Claude missing nuances or making too many tool calls, bump up to high. If responses feel overly verbose, try low.

2. Use Low Effort for Subagents

When building multi-agent systems, use low effort for subagents that handle simple, well-defined tasks (e.g., data extraction, formatting, classification). Reserve high or max for the orchestrator or agents tackling complex reasoning.

3. Combine with Adaptive Thinking for Cost Savings

Adaptive thinking + effort is a powerful combination. Claude will skip extended thinking for trivial requests at low or medium effort, saving significant tokens while preserving quality on hard problems.

4. Monitor Token Usage

Effort is a behavioral signal, not a hard budget. Always monitor your actual token usage and adjust effort levels accordingly. You may find that low effort on a Sonnet 4.6 model outperforms high effort on an older model at a fraction of the cost.

5. Test with Your Specific Workload

The optimal effort level depends on your exact use case, prompt complexity, and tool configuration. Run A/B tests with a representative sample of requests to find the sweet spot for your application.

Migration from budget_tokens

If you're currently using budget_tokens with Claude Opus 4.6 or Sonnet 4.6, Anthropic recommends migrating to the effort parameter. While budget_tokens is still accepted on these models, it is deprecated and will be removed in a future model release.

To migrate:

Remove budget_tokens from your thinking configuration
Add the anthropic-effort header with your desired level
Optionally enable adaptive thinking: thinking: {type: "adaptive"}

Key Takeaways

The effort parameter controls token spending across all response types—text, tool calls, and thinking—giving you broader control than the old budget_tokens approach.
Five effort levels (low, medium, high, xhigh, max) let you trade off between speed/cost and reasoning depth without switching models.
Combine effort with adaptive thinking for optimal results: Claude will skip thinking on simple problems at lower effort levels, saving tokens.
Use medium as your default for most applications, especially with Sonnet 4.6, to avoid unexpected latency from the high default.
Migrate from budget_tokens to effort now, as the older parameter is deprecated and will be removed in future model releases.