GuideBeginnerPricing2026-05-23

Mastering Claude's Effort Parameter: Control Token Spend Without Sacrificing Quality

Learn how to use Claude's effort parameter to balance response thoroughness and token efficiency across all API calls, with practical code examples and recommended settings.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn how to set effort levels from low to max, combine it with adaptive thinking, and optimize for speed, cost, or capability across different use cases.

effort parametertoken optimizationClaude APIcost controlextended thinking

Introduction

Every Claude API call is a balancing act. You want the best possible response, but you also care about speed and cost. Traditionally, you had to choose between different models or manually set budget_tokens to control thinking depth. The effort parameter changes that entirely.

Effort gives you a single, intuitive dial to control how eagerly Claude spends tokens on any request—whether it's generating text, making tool calls, or performing extended thinking. This guide covers everything you need to know to use effort effectively, including recommended settings for Claude Sonnet 4.6 and Opus 4.7.

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how much token spend is appropriate for a given request. It's not a strict budget—Claude will still think deeply on hard problems even at lower effort levels—but it strongly influences how thorough the response is.

Key benefits:

Works without enabling extended thinking
Affects all token spend, including tool calls and function arguments
Single model can serve both quick chat and deep reasoning tasks
Available on all supported models with no beta header required

Supported Models

Model	Effort Levels	Notes
Claude Mythos Preview	low, medium, high, max	Full support
Claude Opus 4.7	low, medium, high, xhigh, max	xhigh for long-horizon tasks
Claude Opus 4.6	low, medium, high, max	Replaces `budget_tokens`
Claude Sonnet 4.6	low, medium, high, max	Replaces `budget_tokens`
Claude Opus 4.5	low, medium, high, max	Basic support

Note: For Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Effort Levels Explained

max

Description: Absolute maximum capability with no constraints on token spending.
Use case: Tasks requiring the deepest possible reasoning and most thorough analysis.
Available on: Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6.

xhigh (Opus 4.7 only)

Description: Extended capability for long-horizon work.
Use case: Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions.

high (default)

Description: High capability. Equivalent to not setting the parameter.
Use case: Complex reasoning, difficult coding problems, agentic tasks.

medium

Description: Balanced approach with moderate token savings.
Use case: Agentic tasks that require a balance of speed, cost, and performance.

low

Description: Most efficient. Significant token savings with some capability reduction.
Use case: Simpler tasks that need the best speed and lowest costs, such as subagents.

How Effort Works in Practice

When you set effort to high, Claude behaves exactly as if you omitted the parameter entirely. At max, it will think more and potentially make more tool calls. At low, it will be more conservative—skipping thinking for simple problems and making fewer tool calls.

This is powerful because it affects all tokens in the response:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

Recommended Settings for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency and token consumption, Anthropic recommends explicitly setting effort:

Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases.

Code Examples

Python SDK

import anthropic
client = anthropic.Anthropic()
Low effort for simple, fast responses
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    effort="low",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)
High effort for complex reasoning
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    effort="high",
    messages=[
        {"role": "user", "content": "Explain the implications of quantum entanglement on information theory."}
    ]
)
Max effort for deepest analysis
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=8192,
    effort="max",
    messages=[
        {"role": "user", "content": "Design a complete architecture for a distributed database system."}
    ]
)

TypeScript SDK

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Medium effort for balanced performance
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 2048,
  effort: 'medium',
  messages: [
    { role: 'user', content: 'Write a Python function to merge two sorted lists.' }
  ]
});
// With adaptive thinking (recommended)
const responseWithThinking = await client.messages.create({
  model: 'claude-opus-4-20250514',
  max_tokens: 4096,
  effort: 'high',
  thinking: { type: 'adaptive' },
  messages: [
    { role: 'user', content: 'Solve this complex math problem step by step.' }
  ]
});

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking by setting thinking: {type: "adaptive"}. This allows Claude to dynamically decide how much thinking is needed based on the problem complexity and the effort level you've set.

At high and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems, saving tokens and reducing latency.

Practical Use Cases

1. Multi-tier Agent Systems

Use different effort levels for different agents in a system:

# Orchestrator agent: high effort for planning
orchestrator_effort = "high"
Sub-agent for simple lookups: low effort
sub_agent_effort = "low"
Code generation agent: medium effort
code_agent_effort = "medium"

2. Cost-Sensitive Applications

For high-volume chat applications, use low effort to reduce token consumption while maintaining acceptable quality for simple queries.

3. Deep Research Tasks

For research or analysis tasks requiring maximum thoroughness, use max effort on Opus 4.7 or Mythos Preview.

Best Practices

Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the default high setting.
Start with medium for most applications and adjust based on observed quality and cost.
Combine with adaptive thinking (thinking: {type: "adaptive"}) for optimal token efficiency.
Use low for sub-agents and simple classification tasks to minimize costs.
Reserve max for complex reasoning where you need the absolute best quality.

Limitations and Considerations

Effort is a behavioral signal, not a strict token budget. Claude may still think deeply on hard problems even at low effort.
The xhigh level is currently only available on Claude Opus 4.7.
Lower effort levels may reduce quality on complex tasks—always test with your specific use case.

Key Takeaways

Effort replaces budget_tokens as the recommended way to control thinking depth on Opus 4.6 and Sonnet 4.6.
Five levels are available: low, medium, high, xhigh (Opus 4.7 only), and max.
Affects all token spend, including text, tool calls, and extended thinking.
Combine with adaptive thinking for the best balance of quality and efficiency.
Set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the default high setting.