BeClaude
GuideBeginnerPricing2026-05-23

Mastering Claude's Effort Parameter: Control Token Spend Without Sacrificing Quality

Learn how to use Claude's effort parameter to balance response thoroughness and token efficiency across all API calls, with practical code examples and recommended settings.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn how to set effort levels from low to max, combine it with adaptive thinking, and optimize for speed, cost, or capability across different use cases.

effort parametertoken optimizationClaude APIcost controlextended thinking

Introduction

Every Claude API call is a balancing act. You want the best possible response, but you also care about speed and cost. Traditionally, you had to choose between different models or manually set budget_tokens to control thinking depth. The effort parameter changes that entirely.

Effort gives you a single, intuitive dial to control how eagerly Claude spends tokens on any request—whether it's generating text, making tool calls, or performing extended thinking. This guide covers everything you need to know to use effort effectively, including recommended settings for Claude Sonnet 4.6 and Opus 4.7.

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how much token spend is appropriate for a given request. It's not a strict budget—Claude will still think deeply on hard problems even at lower effort levels—but it strongly influences how thorough the response is.

Key benefits:
  • Works without enabling extended thinking
  • Affects all token spend, including tool calls and function arguments
  • Single model can serve both quick chat and deep reasoning tasks
  • Available on all supported models with no beta header required

Supported Models

ModelEffort LevelsNotes
Claude Mythos Previewlow, medium, high, maxFull support
Claude Opus 4.7low, medium, high, xhigh, maxxhigh for long-horizon tasks
Claude Opus 4.6low, medium, high, maxReplaces budget_tokens
Claude Sonnet 4.6low, medium, high, maxReplaces budget_tokens
Claude Opus 4.5low, medium, high, maxBasic support
Note: For Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Effort Levels Explained

max

  • Description: Absolute maximum capability with no constraints on token spending.
  • Use case: Tasks requiring the deepest possible reasoning and most thorough analysis.
  • Available on: Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6.

xhigh (Opus 4.7 only)

  • Description: Extended capability for long-horizon work.
  • Use case: Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions.

high (default)

  • Description: High capability. Equivalent to not setting the parameter.
  • Use case: Complex reasoning, difficult coding problems, agentic tasks.

medium

  • Description: Balanced approach with moderate token savings.
  • Use case: Agentic tasks that require a balance of speed, cost, and performance.

low

  • Description: Most efficient. Significant token savings with some capability reduction.
  • Use case: Simpler tasks that need the best speed and lowest costs, such as subagents.

How Effort Works in Practice

When you set effort to high, Claude behaves exactly as if you omitted the parameter entirely. At max, it will think more and potentially make more tool calls. At low, it will be more conservative—skipping thinking for simple problems and making fewer tool calls.

This is powerful because it affects all tokens in the response:

  • Text responses and explanations
  • Tool calls and function arguments
  • Extended thinking (when enabled)

Recommended Settings for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency and token consumption, Anthropic recommends explicitly setting effort:

  • Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
  • Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases.

Code Examples

Python SDK

import anthropic

client = anthropic.Anthropic()

Low effort for simple, fast responses

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, effort="low", messages=[ {"role": "user", "content": "What is the capital of France?"} ] )

High effort for complex reasoning

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, effort="high", messages=[ {"role": "user", "content": "Explain the implications of quantum entanglement on information theory."} ] )

Max effort for deepest analysis

response = client.messages.create( model="claude-opus-4-20250514", max_tokens=8192, effort="max", messages=[ {"role": "user", "content": "Design a complete architecture for a distributed database system."} ] )

TypeScript SDK

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

// Medium effort for balanced performance const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 2048, effort: 'medium', messages: [ { role: 'user', content: 'Write a Python function to merge two sorted lists.' } ] });

// With adaptive thinking (recommended) const responseWithThinking = await client.messages.create({ model: 'claude-opus-4-20250514', max_tokens: 4096, effort: 'high', thinking: { type: 'adaptive' }, messages: [ { role: 'user', content: 'Solve this complex math problem step by step.' } ] });

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking by setting thinking: {type: "adaptive"}. This allows Claude to dynamically decide how much thinking is needed based on the problem complexity and the effort level you've set.

At high and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems, saving tokens and reducing latency.

Practical Use Cases

1. Multi-tier Agent Systems

Use different effort levels for different agents in a system:

# Orchestrator agent: high effort for planning
orchestrator_effort = "high"

Sub-agent for simple lookups: low effort

sub_agent_effort = "low"

Code generation agent: medium effort

code_agent_effort = "medium"

2. Cost-Sensitive Applications

For high-volume chat applications, use low effort to reduce token consumption while maintaining acceptable quality for simple queries.

3. Deep Research Tasks

For research or analysis tasks requiring maximum thoroughness, use max effort on Opus 4.7 or Mythos Preview.

Best Practices

  • Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the default high setting.
  • Start with medium for most applications and adjust based on observed quality and cost.
  • Combine with adaptive thinking (thinking: {type: "adaptive"}) for optimal token efficiency.
  • Use low for sub-agents and simple classification tasks to minimize costs.
  • Reserve max for complex reasoning where you need the absolute best quality.

Limitations and Considerations

  • Effort is a behavioral signal, not a strict token budget. Claude may still think deeply on hard problems even at low effort.
  • The xhigh level is currently only available on Claude Opus 4.7.
  • Lower effort levels may reduce quality on complex tasks—always test with your specific use case.

Key Takeaways

  • Effort replaces budget_tokens as the recommended way to control thinking depth on Opus 4.6 and Sonnet 4.6.
  • Five levels are available: low, medium, high, xhigh (Opus 4.7 only), and max.
  • Affects all token spend, including text, tool calls, and extended thinking.
  • Combine with adaptive thinking for the best balance of quality and efficiency.
  • Set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the default high setting.