Guide2026-05-02

Mastering Claude’s Effort Parameter: Control Token Spend Without Sacrificing Intelligence

Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost across models like Opus 4.6, Sonnet 4.6, and Mythos Preview.

Quick Answer

This guide explains Claude’s effort parameter—a behavioral signal that controls how eagerly Claude spends tokens. You’ll learn how to set effort levels (low, medium, high, xhigh, max) to trade off between response quality and speed/cost, with practical API examples and recommended defaults for Sonnet 4.6.

effort parametertoken efficiencyClaude APIextended thinkingcost optimization

Introduction

When building applications with Claude, you often face a classic trade-off: response quality vs. speed and cost. Do you let Claude think deeply and spend more tokens, or do you push for faster, cheaper responses? Traditionally, you had to switch models or manually set token budgets. Now, with the effort parameter, you can control this balance using a single model—no model swapping required.

Effort is a behavioral signal that tells Claude how eager it should be about spending tokens. It affects everything from text responses and tool calls to extended thinking. This guide will walk you through how effort works, when to use each level, and how to combine it with adaptive thinking for the best results.

How the Effort Parameter Works

By default, Claude uses high effort—spending as many tokens as needed for excellent results. You can raise the level to max for absolute top capability, or lower it to medium or low to save tokens and reduce latency.

Key points:

Effort affects all tokens in the response, including text, tool calls, and extended thinking.
It does not require thinking to be enabled.
Lower effort means Claude will make fewer tool calls and write shorter responses.
Effort is a behavioral signal, not a strict token budget. On difficult problems, Claude will still think—just less than at higher levels.

Effort Levels Overview

Level	Description	Typical Use Case
`low`	Most efficient. Significant token savings with some capability reduction.	Simple tasks, subagents, high-volume chat
`medium`	Balanced approach with moderate token savings.	Agentic tasks needing speed and cost balance
`high`	High capability. Equivalent to not setting the parameter.	Complex reasoning, coding, agentic tasks
`xhigh`	Extended capability for long-horizon work. Available on Opus 4.7 only.	Long-running agentic/coding tasks (>30 min)
`max`	Absolute maximum capability with no constraints. Available on select models.	Deepest reasoning, most thorough analysis

When to Use Each Effort Level

Low Effort

Use low when you need the fastest possible responses and can accept some reduction in quality. Ideal for:

High-volume chat applications
Simple Q&A or classification tasks
Subagents that handle straightforward subtasks

Medium Effort

medium is the sweet spot for most production applications. It offers a good balance of speed, cost, and performance. Recommended for:

Agentic coding workflows
Tool-heavy applications
Code generation where latency matters

High Effort (Default)

Stick with high when you need Claude’s full reasoning power. This is the default behavior, so you don’t need to set it explicitly. Use for:

Complex problem-solving
Difficult coding tasks
Multi-step agentic workflows

Xhigh Effort (Opus 4.7 Only)

xhigh is designed for long-running tasks that require sustained deep reasoning over millions of tokens. Available only on Claude Opus 4.7.

Max Effort

max removes all constraints on token spending, giving you Claude’s absolute best performance. Available on Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, always set effort explicitly when using this model:

Medium (recommended default): Best balance for most applications—agentic coding, tool-heavy workflows, code generation.
Low: For high-volume or latency-sensitive workloads—chat, non-coding use cases.
High: For tasks requiring maximum reasoning depth.

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking allows Claude to dynamically decide how much to think based on the complexity of the request. When paired with effort, you get fine-grained control:

At high or max effort, Claude will almost always think.
At lower effort levels, Claude may skip thinking for simpler problems, saving tokens.

Note: For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future release.

Practical API Examples

Python Example: Setting Effort

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    effort="medium",  # Explicitly set effort level
    messages=[
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ]
)
print(response.content[0].text)

TypeScript Example: Setting Effort

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  effort: 'medium',
  messages: [
    { role: 'user', content: 'Write a Python function to merge two sorted lists.' }
  ]
});
console.log(response.content[0].text);

Example with Adaptive Thinking

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    effort="high",
    thinking={"type": "adaptive"},
    messages=[
        {"role": "user", "content": "Solve this complex math problem step by step: ..."}
    ]
)

Best Practices

Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency.
Start with medium for most production applications, then adjust based on observed performance.
Use low effort for subagents that handle simple, well-defined tasks.
Combine with adaptive thinking for optimal token efficiency on mixed-complexity workloads.
Monitor token usage across effort levels to find the right balance for your use case.

Limitations and Considerations

Effort is a behavioral signal, not a hard budget. Claude may still think deeply on difficult problems even at low effort.
xhigh is only available on Claude Opus 4.7.
max is not available on all models—check the documentation for your specific model.
Effort does not replace the need for max_tokens—always set a reasonable token limit.

Key Takeaways

Effort controls token spend across all response types (text, tool calls, thinking) without changing models.
Five levels are available: low, medium, high, xhigh, and max, each suited to different use cases.
Always set effort explicitly on Sonnet 4.6 to avoid defaulting to high effort unexpectedly.
Combine with adaptive thinking for the best balance of depth and efficiency.
Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6—migrate your code to use effort instead.