BeClaude
GuideBeginnerPricing2026-05-22

Mastering Claude’s Effort Parameter: Control Thinking Depth, Speed, and Cost

Learn how to use Claude's effort parameter to balance response thoroughness, latency, and token spend across all models—from simple subagents to deep reasoning tasks.

Quick Answer

The effort parameter lets you control how eagerly Claude spends tokens on a response, from low (fast/cheap) to max (deepest reasoning). It works with or without extended thinking and affects text, tool calls, and thinking tokens. This guide explains each level, when to use it, and how to combine it with adaptive thinking for optimal results.

effort parameterextended thinkingtoken optimizationClaude APIcost control

Introduction

Claude is incredibly capable, but sometimes you don’t need its full reasoning power. A quick chat, a simple data extraction, or a subagent handling a narrow task doesn’t require the same depth as a complex code review or a multi-step research analysis. That’s where the effort parameter comes in.

Effort gives you fine-grained control over how many tokens Claude spends on a response—without switching models. You can dial up to max for the deepest reasoning, or dial down to low for speed and cost savings. Best of all, it works whether or not you have extended thinking enabled.

In this guide, you’ll learn:

  • What the effort parameter is and how it differs from budget_tokens
  • Each effort level and when to use it
  • How to combine effort with adaptive thinking
  • Practical code examples for Python and TypeScript
  • Tips for optimizing cost and latency

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how thoroughly it should approach a request. At high (the default), Claude spends as many tokens as needed for excellent results. At max, it goes even further—ideal for the hardest problems. At low, it conserves tokens, skipping unnecessary reasoning and making fewer tool calls.

Important: Effort is not a strict token budget. Claude will still think deeply on difficult problems even at lower levels—it just won’t think as much as it would at higher levels.

Supported Models

ModelEffort LevelsNotes
Claude Mythos Previewmax, high, medium, lowFull support
Claude Opus 4.7max, xhigh, high, medium, lowxhigh for long-horizon tasks
Claude Opus 4.6max, high, medium, lowReplaces budget_tokens
Claude Sonnet 4.6max, high, medium, lowReplaces budget_tokens
Claude Opus 4.5high, medium, lowNo max or xhigh
Deprecation note: budget_tokens is still accepted on Opus 4.6 and Sonnet 4.6 but will be removed in a future release. Use effort instead.

Effort Levels Explained

low – Maximum Efficiency

  • Best for: Simple tasks, high-volume chat, subagents, non-coding use cases
  • Behavior: Significant token savings. Claude may skip thinking entirely for straightforward problems.
  • Trade-off: Some capability reduction. Not suitable for complex reasoning.

medium – Balanced

  • Best for: Agentic tasks that need a balance of speed, cost, and performance
  • Behavior: Moderate token savings. Claude still thinks on difficult problems, but less than at higher levels.
  • Recommended default for Sonnet 4.6: Best balance for most applications.

high – Default Capability

  • Best for: Complex reasoning, difficult coding, agentic tasks
  • Behavior: Equivalent to omitting the parameter. Claude spends as many tokens as needed.
  • Trade-off: No cost optimization, but full capability.

xhigh – Extended Capability (Opus 4.7 only)

  • Best for: Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions
  • Behavior: Designed for sustained, deep reasoning over very long contexts.

max – Absolute Maximum

  • Best for: The hardest problems requiring deepest possible reasoning
  • Behavior: No constraints on token spending. Available on Mythos, Opus 4.7, Opus 4.6, and Sonnet 4.6.
  • Trade-off: Highest cost and latency.

How Effort Affects All Tokens

Unlike budget_tokens, which only controlled thinking tokens, effort affects every token in the response:

  • Text responses and explanations – Less verbose at lower levels
  • Tool calls and function arguments – Fewer tool calls at lower levels
  • Extended thinking – Less thinking depth when enabled
This gives you much greater control over total token spend.

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking lets Claude decide how much thinking to use based on the problem, while effort sets the overall ceiling.

Example: With effort: "low" and adaptive thinking, Claude will think only when absolutely necessary, and even then, minimally. With effort: "max", it will think deeply on every request.

Practical Code Examples

Python (using the Anthropic SDK)

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, system="You are a helpful assistant.", messages=[ {"role": "user", "content": "Explain quantum entanglement in simple terms."} ], # Set effort to low for a quick, concise answer effort={"type": "low"} )

print(response.content[0].text)

TypeScript (using the Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

async function main() { const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 4096, system: 'You are a helpful assistant.', messages: [ { role: 'user', content: 'Write a Python function to merge two sorted lists.' } ], // Use medium effort for a balanced response effort: { type: 'medium' } });

console.log(response.content[0].text); }

main();

With Adaptive Thinking

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    effort={"type": "max"},  # Deepest reasoning with adaptive thinking
    messages=[
        {"role": "user", "content": "Solve this complex math problem step by step..."}
    ]
)

Recommended Configurations

For Sonnet 4.6

Use CaseEffort LevelWhy
Chat / Q&AlowFast, cheap, good enough
Agentic codingmediumBest balance
Complex code generationhighFull capability
Hardest problemsmaxNo compromises

For Opus 4.7

Use CaseEffort LevelWhy
Quick researchmediumBalanced depth
Multi-hour coding sessionxhighSustained reasoning
Scientific analysismaxDeepest thinking

Tips for Optimizing Cost and Latency

  • Start with medium for Sonnet 4.6 – It’s the recommended default and avoids unexpected latency.
  • Use low for subagents – Subagents handling narrow tasks don’t need deep reasoning.
  • Reserve max for the hardest 10% of requests – It’s powerful but expensive.
  • Combine with adaptive thinking – Let Claude decide when to think, while you control the ceiling.
  • Monitor token usage – Effort affects all tokens, so track total spend per request.

Conclusion

The effort parameter is a powerful tool for fine-tuning Claude’s behavior. Whether you’re building a high-volume chatbot, a deep research agent, or anything in between, you now have a single dial to control thoroughness, speed, and cost—without switching models.

By combining effort with adaptive thinking, you get the best of both worlds: Claude decides when to think, and you decide how much.

Key Takeaways

  • Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6 and works on all supported models.
  • Effort affects all tokens – text, tool calls, and thinking – giving you broad control over spend.
  • Use medium as your default for Sonnet 4.6 to balance speed, cost, and performance.
  • Combine with adaptive thinking (thinking: {type: "adaptive"}) for optimal results.
  • Reserve max and xhigh for the most demanding tasks; use low for simple or high-volume workloads.