GuideBeginnerPricing2026-05-22

Mastering Claude’s Effort Parameter: Control Thinking Depth, Speed, and Cost

Learn how to use Claude's effort parameter to balance response thoroughness, latency, and token spend across all models—from simple subagents to deep reasoning tasks.

Quick Answer

The effort parameter lets you control how eagerly Claude spends tokens on a response, from low (fast/cheap) to max (deepest reasoning). It works with or without extended thinking and affects text, tool calls, and thinking tokens. This guide explains each level, when to use it, and how to combine it with adaptive thinking for optimal results.

effort parameterextended thinkingtoken optimizationClaude APIcost control

Introduction

Claude is incredibly capable, but sometimes you don’t need its full reasoning power. A quick chat, a simple data extraction, or a subagent handling a narrow task doesn’t require the same depth as a complex code review or a multi-step research analysis. That’s where the effort parameter comes in.

Effort gives you fine-grained control over how many tokens Claude spends on a response—without switching models. You can dial up to max for the deepest reasoning, or dial down to low for speed and cost savings. Best of all, it works whether or not you have extended thinking enabled.

In this guide, you’ll learn:

What the effort parameter is and how it differs from budget_tokens
Each effort level and when to use it
How to combine effort with adaptive thinking
Practical code examples for Python and TypeScript
Tips for optimizing cost and latency

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how thoroughly it should approach a request. At high (the default), Claude spends as many tokens as needed for excellent results. At max, it goes even further—ideal for the hardest problems. At low, it conserves tokens, skipping unnecessary reasoning and making fewer tool calls.

Important: Effort is not a strict token budget. Claude will still think deeply on difficult problems even at lower levels—it just won’t think as much as it would at higher levels.

Supported Models

Model	Effort Levels	Notes
Claude Mythos Preview	max, high, medium, low	Full support
Claude Opus 4.7	max, xhigh, high, medium, low	xhigh for long-horizon tasks
Claude Opus 4.6	max, high, medium, low	Replaces `budget_tokens`
Claude Sonnet 4.6	max, high, medium, low	Replaces `budget_tokens`
Claude Opus 4.5	high, medium, low	No max or xhigh

Deprecation note: budget_tokens is still accepted on Opus 4.6 and Sonnet 4.6 but will be removed in a future release. Use effort instead.

Effort Levels Explained

`low` – Maximum Efficiency

Best for: Simple tasks, high-volume chat, subagents, non-coding use cases
Behavior: Significant token savings. Claude may skip thinking entirely for straightforward problems.
Trade-off: Some capability reduction. Not suitable for complex reasoning.

`medium` – Balanced

Best for: Agentic tasks that need a balance of speed, cost, and performance
Behavior: Moderate token savings. Claude still thinks on difficult problems, but less than at higher levels.
Recommended default for Sonnet 4.6: Best balance for most applications.

`high` – Default Capability

Best for: Complex reasoning, difficult coding, agentic tasks
Behavior: Equivalent to omitting the parameter. Claude spends as many tokens as needed.
Trade-off: No cost optimization, but full capability.

`xhigh` – Extended Capability (Opus 4.7 only)

Best for: Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions
Behavior: Designed for sustained, deep reasoning over very long contexts.

`max` – Absolute Maximum

Best for: The hardest problems requiring deepest possible reasoning
Behavior: No constraints on token spending. Available on Mythos, Opus 4.7, Opus 4.6, and Sonnet 4.6.
Trade-off: Highest cost and latency.

How Effort Affects All Tokens

Unlike budget_tokens, which only controlled thinking tokens, effort affects every token in the response:

Text responses and explanations – Less verbose at lower levels
Tool calls and function arguments – Fewer tool calls at lower levels
Extended thinking – Less thinking depth when enabled

This gives you much greater control over total token spend.

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking lets Claude decide how much thinking to use based on the problem, while effort sets the overall ceiling.

Example: With effort: "low" and adaptive thinking, Claude will think only when absolutely necessary, and even then, minimally. With effort: "max", it will think deeply on every request.

Practical Code Examples

Python (using the Anthropic SDK)

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ],
    # Set effort to low for a quick, concise answer
    effort={"type": "low"}
)
print(response.content[0].text)

TypeScript (using the Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
async function main() {
  const response = await client.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 4096,
    system: 'You are a helpful assistant.',
    messages: [
      { role: 'user', content: 'Write a Python function to merge two sorted lists.' }
    ],
    // Use medium effort for a balanced response
    effort: { type: 'medium' }
  });
console.log(response.content[0].text);
}
main();

With Adaptive Thinking

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    effort={"type": "max"},  # Deepest reasoning with adaptive thinking
    messages=[
        {"role": "user", "content": "Solve this complex math problem step by step..."}
    ]
)

Recommended Configurations

For Sonnet 4.6

Use Case	Effort Level	Why
Chat / Q&A	`low`	Fast, cheap, good enough
Agentic coding	`medium`	Best balance
Complex code generation	`high`	Full capability
Hardest problems	`max`	No compromises

For Opus 4.7

Use Case	Effort Level	Why
Quick research	`medium`	Balanced depth
Multi-hour coding session	`xhigh`	Sustained reasoning
Scientific analysis	`max`	Deepest thinking

Tips for Optimizing Cost and Latency

Start with medium for Sonnet 4.6 – It’s the recommended default and avoids unexpected latency.
Use low for subagents – Subagents handling narrow tasks don’t need deep reasoning.
Reserve max for the hardest 10% of requests – It’s powerful but expensive.
Combine with adaptive thinking – Let Claude decide when to think, while you control the ceiling.
Monitor token usage – Effort affects all tokens, so track total spend per request.

Conclusion

The effort parameter is a powerful tool for fine-tuning Claude’s behavior. Whether you’re building a high-volume chatbot, a deep research agent, or anything in between, you now have a single dial to control thoroughness, speed, and cost—without switching models.

By combining effort with adaptive thinking, you get the best of both worlds: Claude decides when to think, and you decide how much.

Key Takeaways

Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6 and works on all supported models.
Effort affects all tokens – text, tool calls, and thinking – giving you broad control over spend.
Use medium as your default for Sonnet 4.6 to balance speed, cost, and performance.
Combine with adaptive thinking (thinking: {type: "adaptive"}) for optimal results.
Reserve max and xhigh for the most demanding tasks; use low for simple or high-volume workloads.