BeClaude
GuideBeginnerPricing2026-05-15

Mastering Claude's Effort Parameter: Control Token Spend and Response Depth

Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost. Includes practical code examples and recommended settings for Sonnet 4.6 and Opus models.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn how to set effort levels from 'low' to 'max' to trade off between thoroughness and efficiency, with practical API examples and recommended defaults for Sonnet 4.6.

effort parametertoken optimizationClaude APIcost controlextended thinking

Introduction

When building applications with Claude, you often face a trade-off: do you want the deepest possible reasoning, or do you need fast, cost-effective responses? Traditionally, you'd need to switch between different models to achieve this balance. With Claude's effort parameter, you can control this behavior using a single model.

The effort parameter lets you dial in exactly how much "thinking" Claude does before responding—affecting not just reasoning but also tool calls, text generation, and extended thinking. This gives you fine-grained control over token consumption and response quality.

In this guide, you'll learn:

  • What the effort parameter is and how it works
  • The available effort levels and when to use each
  • How to implement effort in your API calls (with code examples)
  • Recommended settings for Claude Sonnet 4.6
  • How effort compares to the legacy budget_tokens parameter

How the Effort Parameter Works

The effort parameter is a behavioral signal that tells Claude how thoroughly it should process your request. It's available on all supported models without any beta header—just add it to your API request.

Key points:
  • By default, Claude uses high effort, spending as many tokens as needed for excellent results.
  • Setting effort to "high" produces exactly the same behavior as omitting the parameter.
  • The parameter affects all tokens in the response: text, tool calls, function arguments, and extended thinking.
  • Lower effort means Claude makes fewer tool calls and provides shorter, more direct responses.
  • Effort is not a strict token budget—it's a behavioral guide. At lower levels, Claude will still think deeply on sufficiently difficult problems, but it will think less than it would at higher levels.

Supported Models

The effort parameter is supported by:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
  • Claude Opus 4.5
For Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future release.

Effort Levels and Use Cases

LevelDescriptionTypical Use Case
maxAbsolute maximum capability with no constraints on token spending.Tasks requiring the deepest possible reasoning (e.g., complex mathematical proofs, multi-step strategic planning). Available on Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.
xhighExtended capability for long-horizon work.Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions. Available on Opus 4.7.
highHigh capability. Equivalent to not setting the parameter.Complex reasoning, difficult coding problems, agentic tasks.
mediumBalanced approach with moderate token savings.Agentic tasks that require a balance of speed, cost, and performance.
lowMost efficient. Significant token savings with some capability reduction.Simpler tasks, high-volume chat, subagents, and latency-sensitive workloads.

Recommended Settings for Sonnet 4.6

Sonnet 4.6 defaults to high effort. If you don't explicitly set the parameter, you'll get the full reasoning depth by default. For most applications, Anthropic recommends:

  • Medium effort (recommended default): Best balance of speed, cost, and performance. Suitable for agentic coding, tool-heavy workflows, and code generation.
  • Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.
Important: Always explicitly set the effort parameter when using Sonnet 4.6 to avoid unexpected latency.

Using Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking:

{
  "thinking": {
    "type": "adaptive"
  },
  "effort": "medium"
}

Adaptive thinking allows Claude to decide when to use extended thinking based on the complexity of the task. When combined with effort, you get a powerful system that automatically adjusts both thinking depth and overall token spend.

Code Examples

Python (using the Anthropic SDK)

import anthropic

client = anthropic.Anthropic()

Low effort for fast, cost-effective responses

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, effort="low", messages=[ {"role": "user", "content": "What is the capital of France?"} ] )

print(response.content[0].text)

TypeScript (using the Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

// Medium effort for balanced performance const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 1024, effort: 'medium', messages: [ { role: 'user', content: 'Write a Python function to sort a list of dictionaries by a key.' } ] });

console.log(response.content[0].text);

Using Effort with Tool Calls

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=2048, effort="low", # Reduces the number of tool calls tools=[ { "name": "get_weather", "description": "Get the current weather for a city", "input_schema": { "type": "object", "properties": { "city": {"type": "string"} }, "required": ["city"] } } ], messages=[ {"role": "user", "content": "What's the weather in Paris and London?"} ] )

With low effort, Claude may make fewer tool calls or combine them

Best Practices

  • Start with medium for Sonnet 4.6: This gives you the best balance for most applications. Only increase to high or max when you need deeper reasoning.
  • Use low for high-volume or latency-sensitive workloads: If you're building a chatbot or handling many concurrent requests, low effort can significantly reduce costs and response times.
  • Combine with adaptive thinking: For maximum flexibility, use thinking: {type: "adaptive"} alongside your chosen effort level. This lets Claude decide when to engage extended thinking.
  • Test with your specific use case: The optimal effort level depends on your application. Run A/B tests to find the sweet spot between quality and cost.
  • Monitor token usage: Lower effort levels should reduce token consumption. Track your usage to validate that the parameter is having the desired effect.

Effort vs. budget_tokens

If you're migrating from budget_tokens (used with Opus 4.6 and Sonnet 4.6), here's what you need to know:

Aspectbudget_tokenseffort
Control typeStrict token budgetBehavioral signal
Requires thinkingYesNo
Affects tool callsIndirectlyDirectly
StatusDeprecatedRecommended
Effort is more flexible because it doesn't require thinking to be enabled, and it directly influences tool call behavior. For example, lower effort means Claude will make fewer tool calls, giving you greater control over efficiency.

Conclusion

The effort parameter is a powerful tool for optimizing Claude's behavior in production applications. By choosing the right effort level, you can balance response quality, speed, and cost without switching between different models. Whether you're building a high-volume chatbot or a deep reasoning agent, effort gives you the control you need.

Key Takeaways

  • Effort replaces budget_tokens for Opus 4.6 and Sonnet 4.6—use it instead of the deprecated parameter.
  • Lower effort reduces all token spend, including tool calls, not just thinking tokens.
  • Medium effort is the recommended default for Sonnet 4.6, balancing speed, cost, and performance.
  • Combine effort with adaptive thinking for the most flexible and efficient configuration.
  • Effort is a behavioral signal, not a strict budget—Claude will still think deeply on hard problems even at low effort levels.