GuideBeginnerPricing2026-05-15

Mastering Claude’s Effort Parameter: Balance Speed, Cost, and Reasoning Depth

Learn how to use the effort parameter in the Claude API to control token spend, response thoroughness, and latency. Includes code examples, effort levels, and best practices for Sonnet 4.6 and Opus 4.7.

Quick Answer

This guide explains Claude’s effort parameter, which lets you dial token spending from low (fast, cheap) to max (deepest reasoning). You’ll learn how to set effort levels, combine with adaptive thinking, and choose the right level for your use case.

effort parametertoken efficiencyextended thinkingClaude APIcost optimization

Introduction

When building applications with Claude, you often face a trade-off: thoroughness vs. speed and cost. Do you want Claude to think deeply and produce the most accurate answer, or do you need a quick response that keeps your API bills low?

Claude’s effort parameter gives you a single dial to control exactly that. Instead of switching between models or manually managing token budgets, you can now tell Claude how eager it should be about spending tokens—all within the same model.

This guide covers everything you need to know: what effort levels mean, how to use them in code, and practical recommendations for different workloads.

What Is the Effort Parameter?

The effort parameter is a new way to control Claude’s behavior across all tokens in a response—including text, tool calls, and extended thinking. It replaces the older budget_tokens parameter on Claude Opus 4.6 and Sonnet 4.6 (which is now deprecated).

Key benefits:

Works without enabling extended thinking
Affects tool calls (e.g., fewer tool calls at lower effort)
Combines seamlessly with adaptive thinking (thinking: {type: "adaptive"})
Available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6

Note: At high (default) and max effort, Claude will almost always think. At lower levels, it may skip thinking for simpler problems.

Effort Levels Explained

Level	Description	Best For
`max`	Absolute maximum capability, no token constraints	Deep reasoning, complex analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6)
`xhigh`	Extended capability for long-horizon work	Long-running agentic/coding tasks (>30 min, millions of tokens) – Opus 4.7 only
`high`	High capability (default)	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced token savings	Agentic tasks needing speed/cost/performance balance
`low`	Most efficient, significant token savings	Simple tasks, subagents, high-volume chat

Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think deeply on sufficiently hard problems—but it will think less than at higher levels.

How to Use the Effort Parameter in Code

Python (Anthropic SDK)

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ],
    # Set effort level
    extra_headers={
        "anthropic-effort": "medium"
    }
)
print(response.content[0].text)

TypeScript (Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 8192,
  system: 'You are a helpful assistant.',
  messages: [
    { role: 'user', content: 'Explain quantum entanglement in simple terms.' }
  ],
  // Set effort level
  extra_headers: {
    'anthropic-effort': 'low'
  }
});
console.log(response.content[0].text);

With Adaptive Thinking

For the best experience, combine effort with adaptive thinking:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    extra_headers={
        "anthropic-effort": "medium"
    },
    messages=[
        {"role": "user", "content": "Design a scalable microservices architecture for an e-commerce platform."}
    ]
)

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, always set effort explicitly when using this model.

Medium (recommended default): Best balance of speed, cost, and performance. Ideal for agentic coding, tool-heavy workflows, and code generation.
Low: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.

Practical Use Cases

1. High-Volume Customer Support Chat

Use low effort for simple FAQ-style queries. You’ll get fast responses and lower costs.

2. Complex Code Generation or Debugging

Use high or max effort. The extra token spend pays off in correctness and depth.

3. Multi-Step Agentic Workflows

Use medium effort for sub-agents that handle routine tasks, and high or max for the orchestrator that makes critical decisions.

4. Long-Running Research or Analysis

Use xhigh (Opus 4.7 only) for tasks that require millions of tokens and deep reasoning over 30+ minutes.

Best Practices

Always set effort explicitly when using Sonnet 4.6 to avoid defaulting to high.
Combine with adaptive thinking for dynamic token allocation.
Test with your workload – effort is a signal, not a hard limit. Run benchmarks to find the sweet spot.
Use lower effort for sub-agents and higher effort for orchestrators in agentic systems.
Monitor token usage – lower effort reduces tool call frequency, which can significantly cut costs.

Caveats

Effort is not available on all models. Check the supported models list.
budget_tokens is deprecated on Opus 4.6 and Sonnet 4.6. Migrate to effort.
At low effort, Claude may skip thinking entirely for simple requests, which could reduce quality on edge cases.

Key Takeaways

The effort parameter lets you control token spending across text, tool calls, and thinking—all with a single model.
Five levels are available: low, medium, high, xhigh (Opus 4.7 only), and max.
Always set effort explicitly on Sonnet 4.6 to avoid unexpected latency.
Combine with adaptive thinking for optimal results.
Lower effort reduces tool call frequency, offering significant cost savings for high-volume or simple tasks.