GuideBeginner2026-05-06

Mastering Claude's Effort Parameter: Balance Token Efficiency and Response Quality

Learn how to control Claude's token spending with the effort parameter. Optimize for speed, cost, or deep reasoning across models like Opus 4.6 and Sonnet 4.6.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn how to set effort levels from 'low' to 'max' to balance speed, cost, and reasoning depth, with practical API examples for Sonnet 4.6 and Opus 4.6.

effort parametertoken optimizationClaude APIextended thinkingcost efficiency

Mastering Claude's Effort Parameter: Balance Token Efficiency and Response Quality

When building applications with Claude, you often face a trade-off between response quality and token efficiency. Do you want Claude to think deeply and produce thorough answers, or do you need fast, cost-effective responses for high-volume tasks? The effort parameter gives you precise control over this balance—without switching models.

This guide explains what the effort parameter is, how it works, and how to use it effectively in your API calls. By the end, you'll know how to choose the right effort level for any task and optimize your Claude integration for speed, cost, or depth.

What Is the Effort Parameter?

The effort parameter controls how eager Claude is about spending tokens when responding to requests. It's a behavioral signal that influences all tokens in the response—including text, tool calls, and extended thinking. Unlike a strict token budget, effort adapts to the complexity of each request: Claude will still think deeply on hard problems, but it will use fewer tokens than it would at a higher effort level.

Key benefits:

Works without enabling extended thinking
Affects all token spend, including tool calls (fewer tool calls at lower effort)
Single model handles multiple efficiency levels

Supported Models

The effort parameter is generally available on:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6
Claude Opus 4.5

For Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Effort Levels Explained

Level	Description	Typical Use Case
`max`	Absolute maximum capability, no token constraints	Deepest reasoning, most thorough analysis
`xhigh`	Extended capability for long-horizon work	Long-running agentic/coding tasks (>30 min) with million-token budgets
`high`	High capability (default)	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced approach with moderate token savings	Agentic tasks needing speed/cost balance
`low`	Most efficient, significant token savings	Simple tasks, subagents, high-volume chat

Note: max is available on Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6. xhigh is available only on Opus 4.7.

Setting effort to "high" produces exactly the same behavior as omitting the parameter entirely.

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set the effort level:

Medium effort (recommended default): Best balance of speed, cost, and performance. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.
High effort: For tasks requiring deep reasoning or complex analysis.

How to Use the Effort Parameter in the API

Basic Usage (Python)

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    # Set effort level
    extra_headers={
        "anthropic-effort": "medium"
    }
)
print(response.content[0].text)

Basic Usage (TypeScript)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    system: 'You are a helpful assistant.',
    messages: [
        { role: 'user', content: 'Explain quantum computing in simple terms.' }
    ],
    // Set effort level
    extra_headers: {
        'anthropic-effort': 'medium'
    }
});
console.log(response.content[0].text);

Combining Effort with Extended Thinking

For maximum control, combine effort with adaptive thinking:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    thinking={
        "type": "adaptive",
        "budget_tokens": 1024
    },
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Design a scalable microservices architecture for an e-commerce platform."}
    ],
    extra_headers={
        "anthropic-effort": "max"
    }
)
print(response.content[0].text)

Using Effort with Tool Calls

Lower effort levels reduce the number of tool calls Claude makes, saving tokens:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "search_web",
            "description": "Search the web for information",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "Find the latest news about AI regulation."}
    ],
    extra_headers={
        "anthropic-effort": "low"
    }
)
print(response.content[0].text)

Practical Scenarios

Scenario 1: High-Volume Customer Support Chat

For a chatbot handling simple FAQs, use low effort to minimize latency and cost:

extra_headers={
    "anthropic-effort": "low"
}

Scenario 2: Complex Code Generation

For generating production-ready code with multiple files, use high effort (default) or max:

extra_headers={
    "anthropic-effort": "high"
}

Scenario 3: Long-Running Agentic Tasks

For tasks that run over 30 minutes with million-token budgets, use xhigh (Opus 4.7 only):

extra_headers={
    "anthropic-effort": "xhigh"
}

Scenario 4: Balanced Subagent

For a subagent that needs reasonable quality but must stay fast, use medium:

extra_headers={
    "anthropic-effort": "medium"
}

Best Practices

Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the default high setting.
Start with medium for most applications, then adjust based on observed performance and cost.
Combine with adaptive thinking (thinking: {type: "adaptive"}) for the best experience—Claude will automatically decide when to think.
Monitor token usage across different effort levels to find the sweet spot for your use case.
Use low effort for subagents and simple tasks where speed matters more than depth.

Limitations and Considerations

Effort is a behavioral signal, not a strict token budget. Claude may still think deeply on hard problems even at low effort.
At lower effort levels, expect some reduction in capability, especially for complex reasoning.
The xhigh level is only available on Claude Opus 4.7.
For Opus 4.6 and Sonnet 4.6, effort replaces the deprecated budget_tokens parameter.

Key Takeaways

The effort parameter gives you fine-grained control over Claude's token spending, trading off between response thoroughness and efficiency.
Five levels are available: low, medium, high (default), xhigh (Opus 4.7), and max.
Medium effort is recommended as a default for most applications, especially with Sonnet 4.6.
Combine effort with adaptive thinking for optimal results—Claude decides when to think based on task complexity.
Lower effort reduces all token spend, including tool calls, making it ideal for high-volume or latency-sensitive workloads.