Guide2026-05-06

Mastering Claude's Effort Parameter: Control Token Spend and Response Depth

Learn how to use Claude's effort parameter to balance response thoroughness, token efficiency, and cost across different models like Opus 4.6, Sonnet 4.6, and Mythos Preview.

Quick Answer

Claude's effort parameter lets you control how eagerly the model spends tokens on responses. Set it to 'low' for fast, cheap answers on simple tasks, 'medium' for balanced agentic work, 'high' for complex reasoning, or 'max' for absolute capability. It works without thinking enabled and affects all tokens including tool calls.

effort parametertoken efficiencyClaude APIextended thinkingcost optimization

Introduction

Claude is incredibly capable, but sometimes you don't need the full force of its reasoning engine. Maybe you're building a high-volume chatbot where speed matters more than depth, or you're running a complex agent that needs to be cost-efficient over long sessions. Enter the effort parameter — a simple but powerful tool that lets you dial Claude's token spending up or down, all with a single model.

In this guide, you'll learn exactly how effort works, when to use each level, and how to combine it with adaptive thinking for the best results. We'll cover practical code examples, recommended settings for popular models, and common pitfalls to avoid.

What Is the Effort Parameter?

The effort parameter controls how eager Claude is about spending tokens when responding to requests. By default, Claude uses high effort — it spends as many tokens as needed for excellent results. But you can lower it to save tokens (and money) or raise it to max for the absolute highest capability.

Key advantages:

No thinking required: Effort works even when extended thinking is disabled.
Affects all tokens: Text responses, tool calls, function arguments, and thinking tokens all scale with effort.
Behavioral signal, not a strict budget: Claude will still think hard on difficult problems, just less than at higher levels.

Effort Levels Explained

Level	Description	Typical Use Case
`max`	Absolute maximum capability, no constraints	Deepest reasoning, complex research
`xhigh`	Extended capability for long-horizon work	Long-running agentic/coding tasks (>30 min)
`high` (default)	High capability, equivalent to omitting the parameter	Complex reasoning, difficult coding, agents
`medium`	Balanced approach with moderate token savings	Agentic tasks needing speed/cost balance
`low`	Most efficient, significant token savings	Simple tasks, subagents, high-volume chat

Note: max is available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6. xhigh is currently only on Opus 4.7.

How to Use Effort in the API

Basic Usage (Python)

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ],
    # Set effort to low for a quick, concise answer
    effort="low"
)
print(response.content[0].text)

With Extended Thinking (TypeScript)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-opus-4-20250514',
  max_tokens: 4096,
  system: 'You are a research assistant.',
  messages: [
    { role: 'user', content: 'Analyze the long-term economic impacts of AI automation.' }
  ],
  thinking: {
    type: 'enabled',
    budget_tokens: 2048
  },
  // High effort for thorough analysis
  effort: 'high'
});
console.log(response.content);

Combining with Adaptive Thinking

For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This lets Claude dynamically decide how much to think based on the problem complexity, while effort sets the overall ceiling.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Solve this calculus problem step by step."}],
    thinking={"type": "adaptive"},
    effort="medium"
)

Recommended Settings by Model

Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, always set effort explicitly:

medium (recommended default): Best balance for agentic coding, tool-heavy workflows, code generation.
low: High-volume or latency-sensitive workloads (chat, non-coding).
high: Tasks requiring deeper reasoning.

Opus 4.6

Opus 4.6 also defaults to high. Use:

high for complex analysis and research.
max for the deepest possible reasoning (available on Opus 4.6+).

Deprecation note: budget_tokens is still accepted on Opus 4.6 and Sonnet 4.6 but is deprecated. Use effort instead.

Practical Scenarios

Scenario 1: High-Volume Customer Support Chat

Set effort to low for fast, cost-effective responses to common questions. Claude will skip unnecessary thinking and produce concise answers.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    messages=[{"role": "user", "content": "What are your business hours?"}],
    effort="low"
)

Scenario 2: Complex Code Generation Agent

Use medium or high effort. The agent will make more tool calls and produce more thorough code, but won't overthink simple steps.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Build a REST API for a todo app with Flask."}],
    tools=[
        {
            "name": "write_file",
            "description": "Write code to a file",
            "input_schema": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "content": {"type": "string"}
                },
                "required": ["path", "content"]
            }
        }
    ],
    effort="medium"
)

Scenario 3: Deep Research with Max Effort

For tasks requiring the deepest possible reasoning, set effort to max and enable extended thinking.

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=8192,
    messages=[{"role": "user", "content": "Compare the philosophical implications of quantum mechanics vs. general relativity."}],
    thinking={"type": "enabled", "budget_tokens": 4096},
    effort="max"
)

Best Practices

Start with medium for most tasks: It's the sweet spot for cost and capability.
Use low for simple, high-volume workloads: Great for subagents and chat.
Reserve high/max for complex reasoning: Don't waste tokens on trivial requests.
Combine with adaptive thinking: Let Claude decide when to think deeply.
Always set effort explicitly on Sonnet 4.6: Avoid unexpected latency.

Common Pitfalls

Assuming low effort means no thinking: Claude will still think on difficult problems, just less.
Using max effort for simple tasks: You'll burn tokens unnecessarily.
Forgetting to set effort on Sonnet 4.6: It defaults to high, which may be slower than expected.
Mixing effort with budget_tokens: On Opus 4.6 and Sonnet 4.6, use effort instead of the deprecated budget_tokens.

Conclusion

The effort parameter is a versatile tool for controlling Claude's behavior. Whether you need lightning-fast responses for a chatbot or deep reasoning for research, you can tune it to match your exact needs. By combining effort with adaptive thinking and choosing the right level for your use case, you'll get the best performance and cost efficiency from Claude.

Key Takeaways

Effort controls token spend across all response types — text, tool calls, and thinking — without requiring extended thinking to be enabled.
Use low for speed/cost savings on simple tasks, medium for balanced agentic work, and high/max for deep reasoning.
Always set effort explicitly on Sonnet 4.6 to avoid defaulting to high and incurring unexpected latency.
Combine effort with adaptive thinking (thinking: {type: "adaptive"}) for the best balance of depth and efficiency.
budget_tokens is deprecated on Opus 4.6 and Sonnet 4.6 — migrate to the effort parameter.