BeClaude
Guide2026-05-04

Mastering Claude's Effort Parameter: Control Thinking Depth, Token Usage, and Cost

Learn how to use Claude's effort parameter to control token spending, thinking depth, and response thoroughness across models like Opus 4.6, Sonnet 4.6, and Mythos Preview.

Quick Answer

Claude's effort parameter lets you control how many tokens the model spends on reasoning and responses. Set it to 'low' for fast, cheap answers; 'high' for thorough analysis; or 'max' for the deepest reasoning. It works with or without extended thinking and affects all tokens including tool calls.

effort parametertoken efficiencyClaude APIextended thinkingcost optimization

Introduction

Claude is incredibly capable, but sometimes you don't need the full force of its reasoning engine. Maybe you're building a high-volume chatbot where speed matters more than depth, or perhaps you're running a complex agent that needs to think deeply about every step. Until recently, you had limited control over this trade-off. Enter the effort parameter.

The effort parameter is a new API feature that lets you dial Claude's token consumption up or down — controlling how "eager" the model is to spend tokens on reasoning, explanations, and tool calls. It's available on Claude Opus 4.5, Opus 4.6, Sonnet 4.6, Opus 4.7, and the Claude Mythos Preview model. This guide will show you exactly how to use it, when to use each level, and how to combine it with adaptive thinking for the best results.

How the Effort Parameter Works

By default, Claude operates at high effort — it spends as many tokens as needed to produce excellent results. The effort parameter gives you a sliding scale:

  • max: Absolute maximum capability, no constraints on token spending.
  • xhigh: Extended capability for long-horizon work (Opus 4.7 only).
  • high: Default behavior. Equivalent to omitting the parameter.
  • medium: Balanced approach with moderate token savings.
  • low: Most efficient. Significant token savings with some capability reduction.
Crucially, effort affects all tokens in the response — not just thinking tokens. This includes:
  • Text responses and explanations
  • Tool calls and function arguments
  • Extended thinking (when enabled)
This means lower effort can reduce the number of tool calls Claude makes, giving you much finer control over efficiency than the old budget_tokens parameter.

When to Use Each Effort Level

Low Effort — Speed and Cost Optimization

Use low effort for:

  • High-volume chat applications
  • Simple Q&A or FAQ bots
  • Subagents that handle straightforward tasks
  • Latency-sensitive workloads
Trade-off: You'll get faster, cheaper responses, but Claude may skip reasoning steps and produce less thorough answers for complex problems.

Medium Effort — The Sweet Spot

Medium effort is the recommended default for most production applications. It's ideal for:

  • Agentic coding tasks
  • Tool-heavy workflows
  • Code generation
  • General-purpose assistants
You get a good balance of speed, cost, and performance. Claude will still think on sufficiently difficult problems, but it will think less than at high effort.

High Effort — Default Thoroughness

High effort is the default and works well for:

  • Complex reasoning tasks
  • Difficult coding problems
  • Agentic tasks that require careful planning
  • Any task where quality is more important than speed

Max Effort — Deepest Reasoning

Max effort is available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6. Use it for:

  • Tasks requiring the deepest possible reasoning
  • Scientific analysis
  • Complex multi-step problem solving
  • Research and analysis

XHigh Effort — Long-Horizon Work (Opus 4.7 Only)

Xhigh is exclusive to Claude Opus 4.7. Use it for:

  • Long-running agentic tasks (over 30 minutes)
  • Coding tasks with token budgets in the millions
  • Extended autonomous workflows

Code Examples

Python SDK

import anthropic

client = anthropic.Anthropic()

Low effort — fast and cheap

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, effort="low", messages=[ {"role": "user", "content": "Explain quantum computing in one paragraph."} ] )

Medium effort — balanced

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=2048, effort="medium", messages=[ {"role": "user", "content": "Write a Python function to merge two sorted lists."} ] )

Max effort — deepest reasoning

response = client.messages.create( model="claude-opus-4-20250514", max_tokens=4096, effort="max", messages=[ {"role": "user", "content": "Prove the Riemann Hypothesis."} ] )

TypeScript SDK

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

// Low effort const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 1024, effort: 'low', messages: [ { role: 'user', content: 'Summarize this article in 50 words.' } ] });

// Medium effort const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 2048, effort: 'medium', messages: [ { role: 'user', content: 'Debug this code and explain the fix.' } ] });

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking. Adaptive thinking lets Claude decide when to use extended thinking based on the complexity of the task. When you set effort to a lower level, Claude will still use thinking for hard problems, but it will think less than at higher effort levels.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={"type": "adaptive"},
    effort="medium",
    messages=[
        {"role": "user", "content": "Design a distributed caching system."}
    ]
)

This combination gives you the best of both worlds: Claude uses thinking only when necessary, and the effort level controls how deeply it thinks.

Effort vs. budget_tokens (Deprecated)

If you've been using budget_tokens to control thinking depth on Opus 4.6 or Sonnet 4.6, it's time to migrate. The effort parameter replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted on these models, it is deprecated and will be removed in a future model release.

Why effort is better:
  • It affects all tokens, not just thinking tokens
  • It's a behavioral signal, not a strict budget — Claude adapts to problem difficulty
  • It works with or without extended thinking enabled
  • It gives finer control over tool call frequency

Best Practices

  • Start with medium effort for most applications. It's the best balance of speed, cost, and performance.
  • Use low effort for subagents that handle simple, well-defined tasks.
  • Use high or max effort for the main agent in complex workflows.
  • Combine with adaptive thinking to let Claude decide when to think deeply.
  • Monitor token usage and adjust effort based on your cost and latency requirements.
  • Test different effort levels with your specific use case — the optimal setting depends on your task complexity.

Key Takeaways

  • The effort parameter controls token spending across all response types — text, tool calls, and thinking — giving you fine-grained control over cost and speed.
  • Medium effort is the recommended default for most production applications, balancing capability with efficiency.
  • Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6; budget_tokens is deprecated and will be removed.
  • Combine effort with adaptive thinking for the best experience — Claude will think deeply on hard problems and skip thinking on simple ones.
  • Different effort levels suit different use cases: low for speed, medium for balance, high for quality, max for deepest reasoning, and xhigh for long-horizon tasks on Opus 4.7.