GuideBeginnerPricing2026-05-22

Mastering Claude's Effort Parameter: Balance Performance, Speed, and Cost

Learn how to use Claude's effort parameter to control token spending, response thoroughness, and latency across different models for optimal API performance.

Quick Answer

This guide explains how to use Claude's effort parameter to control token spending and response depth. You'll learn the five effort levels (low, medium, high, xhigh, max), when to use each, and how to combine effort with adaptive thinking for optimal API performance.

effort parametertoken optimizationClaude APIcost controlextended thinking

Introduction

Claude's effort parameter gives you fine-grained control over how much "thinking" your model does before responding. By adjusting effort, you can trade off between response thoroughness and token efficiency — all with a single model, without switching to a smaller or larger version.

This feature is a game-changer for developers who want to optimize cost and latency while maintaining high-quality outputs. Whether you're building a simple chatbot or a complex agentic system, understanding effort will help you get the most out of Claude.

What Is the Effort Parameter?

The effort parameter controls how eager Claude is about spending tokens when responding to requests. It affects all tokens in the response, including:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

This is different from a strict token budget. Effort is a behavioral signal: at lower levels, Claude will still think on sufficiently difficult problems, but it will think less than it would at higher levels for the same problem.

Supported Models

The effort parameter is available on:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6
Claude Opus 4.5

For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Effort Levels Explained

Level	Description	Typical Use Case
`max`	Absolute maximum capability with no constraints on token spending	Deepest possible reasoning, most thorough analysis
`xhigh`	Extended capability for long-horizon work	Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions
`high`	High capability. Equivalent to not setting the parameter.	Complex reasoning, difficult coding problems, agentic tasks
`medium`	Balanced approach with moderate token savings	Agentic tasks that require a balance of speed, cost, and performance
`low`	Most efficient. Significant token savings with some capability reduction.	Simpler tasks that need the best speed and lowest costs, such as subagents

Note: max is available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. xhigh is available only on Claude Opus 4.7.

How to Use Effort in the API

Basic Usage (Python)

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    effort="medium",  # Control token spending
    messages=[
        {"role": "user", "content": "Write a detailed analysis of quantum computing's impact on cryptography."}
    ]
)
print(response.content[0].text)

Basic Usage (TypeScript)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 8192,
  effort: 'medium',
  messages: [
    { role: 'user', content: 'Write a detailed analysis of quantum computing\'s impact on cryptography.' }
  ]
});
console.log(response.content[0].text);

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking. This allows Claude to dynamically decide how much thinking to use based on the problem complexity, while respecting your effort preference.

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},  # Enable adaptive thinking
    effort="medium",  # Control overall token spend
    messages=[
        {"role": "user", "content": "Debug this complex Python code and explain the fix..."}
    ]
)

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set effort when using this model:

Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed is critical.

Practical Scenarios

Scenario 1: High-Volume Customer Support Chat

For a chatbot handling simple FAQs, use low effort to minimize latency and cost:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    effort="low",
    messages=[
        {"role": "user", "content": "What are your business hours?"}
    ]
)

Scenario 2: Complex Code Review Agent

For a code review agent that needs deep analysis, use high or max effort:

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=16384,
    effort="high",
    thinking={"type": "adaptive"},
    messages=[
        {"role": "user", "content": "Review this pull request for security vulnerabilities and performance issues..."}
    ]
)

Scenario 3: Long-Running Agentic Task

For tasks that run over 30 minutes with token budgets in the millions, use xhigh (Opus 4.7 only):

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=200000,
    effort="xhigh",
    thinking={"type": "adaptive"},
    messages=[
        {"role": "user", "content": "Refactor this entire codebase to use TypeScript with proper types and tests..."}
    ]
)

Effort vs. Token Budget

Aspect	Effort	budget_tokens
Type	Behavioral signal	Strict token limit
Flexibility	Adapts to problem difficulty	Fixed cap
Thinking required	No	Yes
Tool calls affected	Yes	No
Future support	Active development	Deprecated

Effort is the recommended approach because it:

Doesn't require thinking to be enabled
Affects all token spend, including tool calls
Adapts intelligently to problem difficulty

Best Practices

Start with medium effort for most applications, then adjust based on observed performance and cost.
Combine with adaptive thinking for optimal results — let Claude decide when to think deeply.
Set effort explicitly when using Sonnet 4.6 to avoid default high latency.
Monitor token usage across different effort levels to find your sweet spot.
Use low effort for subagents and simple tasks where speed matters more than depth.
Reserve max effort for the most challenging problems that require absolute best performance.

Key Takeaways

Effort controls token spending across all response types (text, tools, thinking) without requiring thinking to be enabled.
Five levels available: low, medium, high, xhigh (Opus 4.7), and max — each offering a different trade-off between capability and efficiency.
Combine with adaptive thinking (thinking: {type: "adaptive"}) for the best balance of performance and cost.
Explicitly set effort when using Sonnet 4.6 to avoid unexpected latency from the default high setting.
Effort replaces budget_tokens as the recommended way to control thinking depth on Opus 4.6 and Sonnet 4.6.