BeClaude
GuideBeginnerPricing2026-05-20

Mastering Claude's Effort Parameter: Optimize Token Spend Without Sacrificing Quality

Learn how to use Claude's effort parameter to control thinking depth, reduce costs, and speed up responses across all API models—from simple chats to complex agentic tasks.

Quick Answer

This guide explains Claude's effort parameter—a behavioral signal that lets you trade off between response thoroughness and token efficiency. You'll learn how to set effort levels (low, medium, high, xhigh, max) for different use cases, combine it with adaptive thinking, and see practical API code examples.

effort parametertoken optimizationextended thinkingClaude APIcost control

Introduction

Every Claude API call consumes tokens, and every token costs money. But not every task requires Claude's full reasoning power. A simple Q&A about your company's vacation policy doesn't need the same depth as debugging a multi-file codebase. That's where the effort parameter comes in.

Introduced across Claude Opus 4.5, 4.6, 4.7, Sonnet 4.6, and Mythos Preview, the effort parameter lets you dial Claude's "eagerness" to spend tokens up or down—without switching models. This gives you fine-grained control over speed, cost, and capability from a single API call.

In this guide, you'll learn:

  • What effort is and how it differs from a token budget
  • The five effort levels and when to use each
  • How to combine effort with adaptive thinking for maximum efficiency
  • Practical code examples in Python and TypeScript
  • Best practices for Sonnet 4.6, Opus 4.7, and beyond

How Effort Works

By default, Claude operates at high effort—spending as many tokens as needed for excellent results. The effort parameter is a behavioral signal, not a strict token budget. At lower levels, Claude may still think deeply on hard problems, but it will think less than it would at higher levels.

Key points:

  • Effort affects all tokens: text responses, tool calls, function arguments, and extended thinking.
  • It works without enabling extended thinking. You can use effort on any supported model, even if you don't use thinking.
  • Lower effort means fewer tool calls, shorter explanations, and faster responses.
  • Setting effort: "high" is identical to omitting the parameter entirely.

Effort Levels and When to Use Them

LevelDescriptionBest For
lowMost efficient. Significant token savings with some capability reduction.Simple Q&A, high-volume chat, non-coding subagents, latency-sensitive workloads.
mediumBalanced approach with moderate token savings.Agentic tasks needing speed/cost balance, tool-heavy workflows, code generation.
highHigh capability. Default behavior.Complex reasoning, difficult coding, standard agentic tasks.
xhighExtended capability for long-horizon work. Available on Opus 4.7 only.Long-running coding sessions (30+ min), multi-million token budgets.
maxAbsolute maximum capability with no constraints. Available on Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6.Deepest possible reasoning, most thorough analysis, research-grade tasks.

Recommended Defaults for Sonnet 4.6

Sonnet 4.6 defaults to high effort. If you don't set effort explicitly, you may get unexpected latency. Anthropic recommends:

  • Medium effort as your default for most applications: best balance of speed, cost, and performance. Suitable for agentic coding, tool-heavy workflows, and code generation.
  • Low effort for high-volume or latency-sensitive workloads: chat, non-coding use cases.

Combining Effort with Adaptive Thinking

For the best experience, pair effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking lets Claude dynamically decide how much to think based on the problem's complexity, while effort sets the overall ceiling.

{
  "model": "claude-sonnet-4-20250514",
  "thinking": {
    "type": "adaptive"
  },
  "effort": "medium"
}

At high and max effort, Claude will almost always think. At lower levels, it may skip thinking for simpler problems.

Practical Code Examples

Python Example: Setting Effort with the Messages API

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, effort="low", # or "medium", "high", "max" messages=[ { "role": "user", "content": "Explain the difference between a list and a tuple in Python." } ] )

print(response.content[0].text)

TypeScript Example: Effort with Tool Calls

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

async function runAgent() { const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 4096, effort: 'medium', tools: [ { name: 'get_weather', description: 'Get current weather for a city', input_schema: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'] } } ], messages: [ { role: 'user', content: 'What\'s the weather in Tokyo?' } ] });

console.log(response.content); }

runAgent();

Python Example: Effort + Adaptive Thinking + Tools

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=8192, thinking={"type": "adaptive"}, effort="high", tools=[ { "name": "search_docs", "description": "Search internal documentation", "input_schema": { "type": "object", "properties": { "query": {"type": "string"} }, "required": ["query"] } } ], messages=[ { "role": "user", "content": "Find the API key rotation policy and summarize it." } ] )

print(response.content)

Migration from budget_tokens

If you previously used budget_tokens on Opus 4.6 or Sonnet 4.6, note that it is now deprecated and will be removed in a future model release. Replace it with the effort parameter:

Before (deprecated):
{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 16000
  }
}
After (recommended):
{
  "thinking": {
    "type": "adaptive"
  },
  "effort": "high"
}

Best Practices

  • Start with medium effort for most agentic and coding tasks. It gives you a great balance without surprises.
  • Use low effort for high-volume chat or subagents that handle simple routing or data extraction.
  • Reserve max effort for tasks where quality is paramount and cost is secondary—like research analysis or complex debugging.
  • Always pair effort with adaptive thinking when you need extended reasoning. The combination is more efficient than a fixed token budget.
  • Test across effort levels during development. A task that works well at high may also work at medium with 30% fewer tokens.
  • Monitor token usage with the API response's usage field to validate your effort choices.

Common Pitfalls

  • Assuming lower effort always reduces quality. For simple tasks, low effort often produces identical results to high—just faster and cheaper.
  • Forgetting to set effort on Sonnet 4.6. It defaults to high, which may be overkill for chat applications.
  • Using effort without adaptive thinking. While effort works standalone, combining it with adaptive thinking gives Claude more flexibility to allocate thinking depth where it matters.

Conclusion

The effort parameter is a powerful tool for optimizing your Claude API usage. By matching effort level to task complexity, you can reduce costs, improve latency, and maintain quality—all with a single model. Start with medium effort as your default, experiment with low for simple tasks, and save max for when you need Claude's absolute best.

Key Takeaways

  • Effort is a behavioral signal, not a strict token budget. It controls how eagerly Claude spends tokens across text, tools, and thinking.
  • Five levels (low, medium, high, xhigh, max) let you trade off between speed/cost and capability without switching models.
  • Combine effort with adaptive thinking for the best results, especially on Opus 4.7 and Sonnet 4.6.
  • Sonnet 4.6 defaults to high effort—explicitly set it to medium or low for latency-sensitive applications.
  • budget_tokens is deprecated on Opus 4.6 and Sonnet 4.6. Migrate to effort + adaptive thinking.