GuideBeginnerPricing2026-05-19

Mastering Claude's Effort Parameter: Control Thinking Depth, Speed, and Cost

Learn how to use Claude's effort parameter to balance response thoroughness, token efficiency, and latency. Includes code examples, effort levels, and best practices for Opus and Sonnet models.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn the five effort levels (low, medium, high, xhigh, max), how to set them in API calls, and when to use each for optimal speed, cost, and capability.

effort parameterextended thinkingtoken efficiencyClaude APIcost optimization

Introduction

Claude is incredibly powerful, but sometimes you don't need it to think deeply about every request. Maybe you're building a high-volume chatbot where speed matters more than philosophical depth, or perhaps you're running a complex agent that needs maximum reasoning for hours-long tasks. The effort parameter gives you precise control over this trade-off.

Introduced as the recommended replacement for budget_tokens on Claude Opus 4.6 and Sonnet 4.6, effort is a behavioral signal that tells Claude how thoroughly to approach a problem. It affects all tokens in the response—including text, tool calls, and extended thinking—making it a powerful lever for optimizing both cost and performance.

In this guide, you'll learn:

What the effort parameter is and how it works
The five effort levels and when to use each
How to set effort in API calls with code examples
Best practices for combining effort with adaptive thinking
Practical recommendations for Sonnet 4.6 and Opus 4.7

How Effort Works

The effort parameter is available on all supported models (Claude Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6, and Opus 4.5) with no beta header required. By default, Claude uses high effort—the same behavior as omitting the parameter entirely.

Here's the key insight: effort is not a strict token budget. It's a behavioral signal. At lower effort levels, Claude may still think deeply on sufficiently difficult problems, but it will think less than it would at higher levels for the same problem. This means you get intelligent scaling without hard cutoffs.

What Effort Affects

Text responses and explanations – shorter, more direct answers at low effort
Tool calls and function arguments – fewer tool calls at lower effort
Extended thinking – less thinking depth when effort is reduced

This is a major advantage over older approaches: you don't need to enable thinking to use effort, and it affects all token spend, including tool calls.

The Five Effort Levels

Level	Description	Typical Use Case
`low`	Most efficient. Significant token savings with some capability reduction.	High-volume chat, simple subagents, latency-sensitive workloads
`medium`	Balanced approach with moderate token savings.	Agentic tasks that need a balance of speed, cost, and performance
`high`	High capability. Equivalent to not setting the parameter.	Complex reasoning, difficult coding, standard agentic tasks
`xhigh`	Extended capability for long-horizon work. Available on Opus 4.7 only.	Long-running coding or agentic tasks (30+ minutes, millions of tokens)
`max`	Absolute maximum capability with no constraints. Available on Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.	Deepest possible reasoning, most thorough analysis

Code Examples

Setting Effort in Python

import anthropic
client = anthropic.Anthropic()
Low effort for fast, cheap responses
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    effort="low",
    messages=[
        {"role": "user", "content": "Summarize this email in one sentence."}
    ]
)
High effort for complex reasoning
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    effort="high",
    messages=[
        {"role": "user", "content": "Debug this multi-threaded Python application..."}
    ]
)
Max effort for maximum capability
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=8192,
    effort="max",
    messages=[
        {"role": "user", "content": "Prove the Riemann Hypothesis..."}
    ]
)

Setting Effort in TypeScript

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Medium effort for balanced agentic tasks
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 2048,
  effort: 'medium',
  messages: [
    { role: 'user', content: 'Write a Python script to scrape this website...' }
  ]
});
// xhigh effort for long-running tasks (Opus 4.7 only)
const response = await client.messages.create({
  model: 'claude-opus-4-20250514',
  max_tokens: 65536,
  effort: 'xhigh',
  messages: [
    { role: 'user', content: 'Refactor this entire codebase...' }
  ]
});

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking:

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={"type": "adaptive"},
    effort="high",
    messages=[
        {"role": "user", "content": "Design a distributed caching system..."}
    ]
)

Adaptive thinking lets Claude decide when to think, while effort controls how much it thinks when it does.

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort, which can introduce unexpected latency. Anthropic recommends explicitly setting effort to avoid surprises:

Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed is critical.

Best Practices

1. Start with Medium, Adjust Based on Feedback

If you're unsure, start with medium effort. Monitor response quality and latency, then increase to high or max if the results aren't thorough enough, or decrease to low if you need faster responses.

2. Use Low Effort for Subagents

In multi-agent systems, subagents often handle simple, well-defined tasks. Setting their effort to low can dramatically reduce costs without sacrificing overall system quality.

3. Reserve Max Effort for Critical Tasks

Max effort is the most expensive option. Use it sparingly—only for tasks that genuinely require the deepest reasoning, such as complex mathematical proofs, architectural decisions, or high-stakes analysis.

4. Combine with Adaptive Thinking

Adaptive thinking (thinking: {"type": "adaptive"}) allows Claude to decide whether thinking is needed for each request. Combined with effort, you get fine-grained control: effort sets the ceiling on thinking depth, while adaptive thinking decides when to think at all.

5. Monitor Token Usage

Effort is a behavioral signal, not a hard budget. Always monitor your actual token usage and adjust effort levels based on real-world data. Use the token counting endpoint to estimate costs before scaling.

Common Pitfalls

Assuming low effort means no thinking: Claude may still think on difficult problems even at low effort. It just thinks less.
Using max effort for everything: This wastes tokens and increases latency. Most tasks don't need maximum capability.
Forgetting to set effort on Sonnet 4.6: The default is high, which may be slower than expected for simple tasks.
Mixing effort with deprecated budget_tokens: On Opus 4.6 and Sonnet 4.6, use effort instead of budget_tokens. The latter is deprecated and will be removed.

Conclusion

The effort parameter is a powerful tool for optimizing Claude's behavior to match your specific needs. Whether you're building a fast chatbot, a cost-sensitive subagent, or a deep-thinking research assistant, effort gives you granular control over the speed-capability trade-off.

By understanding the five effort levels and following the best practices outlined here, you can build more efficient, cost-effective applications that still deliver excellent results when it matters most.

Key Takeaways

Effort controls how eagerly Claude spends tokens on responses, affecting text, tool calls, and extended thinking.
Five levels are available: low, medium, high, xhigh (Opus 4.7 only), and max.
Medium effort is the recommended default for Sonnet 4.6 to balance speed, cost, and performance.
Combine effort with adaptive thinking for the best experience—adaptive thinking decides when to think, effort controls how much.
Effort is a behavioral signal, not a strict budget; Claude may still think deeply on hard problems even at lower effort levels.