BeClaude
GuideBeginnerPricing2026-05-22

Mastering Claude's Effort Parameter: Balance Performance, Speed, and Cost

Learn how to use Claude's effort parameter to control token spending, response thoroughness, and latency across different models for optimal API performance.

Quick Answer

This guide explains how to use Claude's effort parameter to control token spending and response depth. You'll learn the five effort levels (low, medium, high, xhigh, max), when to use each, and how to combine effort with adaptive thinking for optimal API performance.

effort parametertoken optimizationClaude APIcost controlextended thinking

Introduction

Claude's effort parameter gives you fine-grained control over how much "thinking" your model does before responding. By adjusting effort, you can trade off between response thoroughness and token efficiency — all with a single model, without switching to a smaller or larger version.

This feature is a game-changer for developers who want to optimize cost and latency while maintaining high-quality outputs. Whether you're building a simple chatbot or a complex agentic system, understanding effort will help you get the most out of Claude.

What Is the Effort Parameter?

The effort parameter controls how eager Claude is about spending tokens when responding to requests. It affects all tokens in the response, including:

  • Text responses and explanations
  • Tool calls and function arguments
  • Extended thinking (when enabled)
This is different from a strict token budget. Effort is a behavioral signal: at lower levels, Claude will still think on sufficiently difficult problems, but it will think less than it would at higher levels for the same problem.

Supported Models

The effort parameter is available on:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
  • Claude Opus 4.5
For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Effort Levels Explained

LevelDescriptionTypical Use Case
maxAbsolute maximum capability with no constraints on token spendingDeepest possible reasoning, most thorough analysis
xhighExtended capability for long-horizon workLong-running agentic and coding tasks (over 30 minutes) with token budgets in the millions
highHigh capability. Equivalent to not setting the parameter.Complex reasoning, difficult coding problems, agentic tasks
mediumBalanced approach with moderate token savingsAgentic tasks that require a balance of speed, cost, and performance
lowMost efficient. Significant token savings with some capability reduction.Simpler tasks that need the best speed and lowest costs, such as subagents
Note: max is available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. xhigh is available only on Claude Opus 4.7.

How to Use Effort in the API

Basic Usage (Python)

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=8192, effort="medium", # Control token spending messages=[ {"role": "user", "content": "Write a detailed analysis of quantum computing's impact on cryptography."} ] )

print(response.content[0].text)

Basic Usage (TypeScript)

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 8192, effort: 'medium', messages: [ { role: 'user', content: 'Write a detailed analysis of quantum computing\'s impact on cryptography.' } ] });

console.log(response.content[0].text);

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking. This allows Claude to dynamically decide how much thinking to use based on the problem complexity, while respecting your effort preference.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=8192, thinking={"type": "adaptive"}, # Enable adaptive thinking effort="medium", # Control overall token spend messages=[ {"role": "user", "content": "Debug this complex Python code and explain the fix..."} ] )

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set effort when using this model:

  • Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
  • Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed is critical.

Practical Scenarios

Scenario 1: High-Volume Customer Support Chat

For a chatbot handling simple FAQs, use low effort to minimize latency and cost:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    effort="low",
    messages=[
        {"role": "user", "content": "What are your business hours?"}
    ]
)

Scenario 2: Complex Code Review Agent

For a code review agent that needs deep analysis, use high or max effort:

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=16384,
    effort="high",
    thinking={"type": "adaptive"},
    messages=[
        {"role": "user", "content": "Review this pull request for security vulnerabilities and performance issues..."}
    ]
)

Scenario 3: Long-Running Agentic Task

For tasks that run over 30 minutes with token budgets in the millions, use xhigh (Opus 4.7 only):

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=200000,
    effort="xhigh",
    thinking={"type": "adaptive"},
    messages=[
        {"role": "user", "content": "Refactor this entire codebase to use TypeScript with proper types and tests..."}
    ]
)

Effort vs. Token Budget

AspectEffortbudget_tokens
TypeBehavioral signalStrict token limit
FlexibilityAdapts to problem difficultyFixed cap
Thinking requiredNoYes
Tool calls affectedYesNo
Future supportActive developmentDeprecated
Effort is the recommended approach because it:
  • Doesn't require thinking to be enabled
  • Affects all token spend, including tool calls
  • Adapts intelligently to problem difficulty

Best Practices

  • Start with medium effort for most applications, then adjust based on observed performance and cost.
  • Combine with adaptive thinking for optimal results — let Claude decide when to think deeply.
  • Set effort explicitly when using Sonnet 4.6 to avoid default high latency.
  • Monitor token usage across different effort levels to find your sweet spot.
  • Use low effort for subagents and simple tasks where speed matters more than depth.
  • Reserve max effort for the most challenging problems that require absolute best performance.

Key Takeaways

  • Effort controls token spending across all response types (text, tools, thinking) without requiring thinking to be enabled.
  • Five levels available: low, medium, high, xhigh (Opus 4.7), and max — each offering a different trade-off between capability and efficiency.
  • Combine with adaptive thinking (thinking: {type: "adaptive"}) for the best balance of performance and cost.
  • Explicitly set effort when using Sonnet 4.6 to avoid unexpected latency from the default high setting.
  • Effort replaces budget_tokens as the recommended way to control thinking depth on Opus 4.6 and Sonnet 4.6.