GuideBeginnerPricing2026-05-17

Mastering Claude's Effort Parameter: Balance Performance and Cost

Learn how to use Claude's effort parameter to control token spending, optimize response thoroughness, and reduce costs across all API interactions.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn the five effort levels (low, medium, high, xhigh, max), when to use each, and how to implement them in your API calls to balance performance and cost.

effort parametertoken optimizationClaude APIcost controlextended thinking

Mastering Claude's Effort Parameter: Balance Performance and Cost

When building applications with Claude, one of the most powerful yet underutilized controls is the effort parameter. This feature lets you dial in exactly how much computational resource—and therefore cost—Claude dedicates to each request. Whether you're building a high-volume chatbot or a deep reasoning agent, understanding effort is key to optimizing both performance and budget.

What Is the Effort Parameter?

The effort parameter controls how eager Claude is about spending tokens when responding to requests. It's a behavioral signal that influences all tokens in the response, including:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

Unlike a strict token budget, effort is a behavioral signal. At lower levels, Claude will still think deeply on sufficiently difficult problems, but it will think less than it would at higher levels for the same problem.

Supported Models

The effort parameter is available on:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6
Claude Opus 4.5

For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Effort Levels Explained

There are five effort levels, each suited to different use cases:

Level	Description	Typical Use Case
max	Absolute maximum capability with no constraints on token spending	Deepest possible reasoning, most thorough analysis
xhigh	Extended capability for long-horizon work (Opus 4.7 only)	Long-running agentic and coding tasks (30+ minutes) with token budgets in the millions
high	High capability. Equivalent to not setting the parameter.	Complex reasoning, difficult coding problems, agentic tasks
medium	Balanced approach with moderate token savings	Agentic tasks requiring a balance of speed, cost, and performance
low	Most efficient. Significant token savings with some capability reduction.	Simpler tasks needing best speed and lowest costs, such as subagents

Important: Setting effort to "high" produces exactly the same behavior as omitting the effort parameter entirely.

How Effort Works Under the Hood

The effort parameter affects all token spend, including tool calls. For example, at lower effort levels, Claude will make fewer tool calls. This gives you much greater control over efficiency compared to older methods like budget_tokens.

At high (default) and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems, saving tokens and reducing latency.

Practical Implementation

Basic Usage in Python

import anthropic
client = anthropic.Anthropic()
Low effort for high-volume, simple tasks
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    effort="low"  # Fast and cheap for simple questions
)
print(response.content[0].text)

Medium Effort for Balanced Performance

# Medium effort for agentic coding tasks
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system="You are a senior software engineer.",
    messages=[
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ],
    effort="medium"
)

Max Effort for Deep Reasoning

# Max effort for complex analysis
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=8192,
    system="You are a research scientist.",
    messages=[
        {"role": "user", "content": "Analyze the implications of quantum computing on current encryption standards."}
    ],
    effort="max"
)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
async function getResponse() {
  const response = await client.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    system: 'You are a helpful assistant.',
    messages: [
      { role: 'user', content: 'Summarize this article.' }
    ],
    effort: 'low'
  });
console.log(response.content[0].text);
}

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={"type": "adaptive"},
    messages=[
        {"role": "user", "content": "Solve this complex math problem step by step."}
    ],
    effort="medium"
)

Adaptive thinking allows Claude to dynamically decide when to engage extended thinking, while effort controls the overall token budget. This combination is particularly powerful for applications that handle a mix of simple and complex queries.

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, explicitly set effort when using this model:

Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.

When to Use Each Level

Low Effort

Best for: Simple Q&A, basic classification, high-volume chatbots, subagents
Benefits: Fastest response times, lowest cost
Trade-off: Reduced capability on complex problems

Medium Effort

Best for: Agentic tasks, coding assistance, tool-heavy workflows
Benefits: Good balance of speed, cost, and performance
Trade-off: May not be sufficient for the most complex reasoning

High Effort (Default)

Best for: Complex reasoning, difficult coding problems, detailed analysis
Benefits: High capability without constraints
Trade-off: Higher token usage and cost

XHigh Effort (Opus 4.7 Only)

Best for: Long-running agentic tasks (30+ minutes), tasks with token budgets in the millions
Benefits: Extended capability for sustained reasoning
Trade-off: Highest token usage

Max Effort

Best for: Deepest possible reasoning, most thorough analysis
Benefits: Absolute maximum capability
Trade-off: No constraints on token spending

Common Pitfalls to Avoid

Using high effort for simple tasks: This wastes tokens and increases latency without meaningful quality gains.
Not setting effort explicitly on Sonnet 4.6: The default is high, which may cause unexpected latency.
Assuming effort is a strict budget: Effort is a behavioral signal, not a hard limit. Claude may still think deeply on difficult problems even at low effort.
Forgetting to combine with adaptive thinking: For maximum efficiency, use thinking: {"type": "adaptive"} alongside effort.

Key Takeaways

Effort controls token spend across all response types—text, tool calls, and extended thinking—giving you fine-grained control over cost and performance.
Five levels exist: low, medium, high, xhigh (Opus 4.7 only), and max, each suited to different use cases.
Combine with adaptive thinking (thinking: {"type": "adaptive"}) for the best balance of capability and efficiency.
Explicitly set effort on Sonnet 4.6 to avoid unexpected latency from the default high setting.
Effort is a behavioral signal, not a strict budget—Claude will still think deeply on hard problems even at lower levels, but it will think less than at higher levels.