BeClaude
Guide2026-05-05

Mastering Claude’s Effort Parameter: Control Token Spend Without Sacrificing Intelligence

Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost. Includes code examples, recommended levels, and best practices for Opus 4.6, Sonnet 4.6, and more.

Quick Answer

This guide explains Claude’s effort parameter, which lets you control how eagerly Claude spends tokens. You’ll learn how to set effort levels (low, medium, high, max), combine it with adaptive thinking, and see practical API examples to optimize speed and cost for your use case.

effort parametertoken efficiencyClaude APIadaptive thinkingcost optimization

Introduction

Every Claude API call is a trade-off between thoroughness and efficiency. Do you want Claude to think deeply and produce the most complete answer possible? Or do you need a fast, low-cost response for a high-volume task? Historically, you had to choose between different models or fiddle with token budgets. Now, with the effort parameter, you can control this balance using a single model.

Effort is a behavioral signal that tells Claude how eager it should be about spending tokens. It works across all response types—text, tool calls, and extended thinking—and it’s available on Claude Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6, and Opus 4.5. This guide will show you exactly how to use it, when to choose each level, and how to combine it with adaptive thinking for the best results.

How the Effort Parameter Works

By default, Claude uses high effort, meaning it will spend as many tokens as needed to produce excellent results. You can lower the effort to save tokens and speed up responses, or raise it to max for the absolute highest capability.

Key points:

  • Effort affects all tokens in the response: text, tool calls, and thinking (when enabled).
  • It does not require extended thinking to be enabled.
  • Lower effort means Claude may skip thinking for simple problems and make fewer tool calls.
  • Setting effort: "high" is identical to omitting the parameter entirely.
Note: For Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Effort Levels and Use Cases

LevelDescriptionTypical Use Case
maxAbsolute maximum capability, no token constraintsDeepest reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6)
xhighExtended capability for long-horizon workLong-running agentic/coding tasks over 30 minutes (Opus 4.7 only)
highHigh capability, default behaviorComplex reasoning, difficult coding, agentic tasks
mediumBalanced approach with moderate token savingsAgentic tasks needing speed, cost, and performance balance
lowMost efficient, significant token savingsSimpler tasks, subagents, high-volume chat

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, always set effort explicitly:

  • Medium (recommended default): Best balance for most applications—agentic coding, tool-heavy workflows, code generation.
  • Low: For high-volume or latency-sensitive workloads—chat, non-coding tasks where speed matters most.
  • High: For tasks that need maximum quality from Sonnet 4.6.

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking by setting thinking: {type: "adaptive"}. This lets Claude dynamically decide when to think based on the problem difficulty and your effort level.

  • At high and max effort, Claude will almost always think.
  • At lower effort levels, Claude may skip thinking for simpler problems, saving tokens.
This combination gives you fine-grained control: you set the overall token-spending appetite with effort, and adaptive thinking decides when thinking is actually needed.

Practical API Examples

Python (using the Anthropic SDK)

import anthropic

client = anthropic.Anthropic()

Low effort for fast, cheap responses

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, effort="low", messages=[ {"role": "user", "content": "Summarize this email in one sentence."} ] ) print(response.content[0].text)

High effort for complex reasoning

response = client.messages.create( model="claude-opus-4-20250514", max_tokens=4096, effort="high", messages=[ {"role": "user", "content": "Debug this Python code and explain the fix..."} ] )

Max effort with adaptive thinking

response = client.messages.create( model="claude-opus-4-20250514", max_tokens=8192, effort="max", thinking={"type": "adaptive"}, messages=[ {"role": "user", "content": "Prove the Riemann Hypothesis..."} ] )

TypeScript (using the Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

// Medium effort for balanced agentic tasks const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 2048, effort: 'medium', messages: [ { role: 'user', content: 'Write a function to fetch and parse JSON from an API.' } ] });

console.log(response.content[0].text);

// Low effort for high-volume chat const fastResponse = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 512, effort: 'low', messages: [ { role: 'user', content: 'What is the capital of France?' } ] });

REST API (raw curl)

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "effort": "medium",
    "messages": [
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
  }'

Best Practices

  • Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the default high setting.
  • Start with medium for most applications—it gives the best balance of speed, cost, and quality.
  • Use low for subagents or high-volume pipelines where each call must be fast and cheap.
  • Reserve max for the hardest problems—deep mathematical proofs, complex multi-step reasoning, or tasks where you need Claude’s absolute best.
  • Combine with adaptive thinking (thinking: {type: "adaptive"}) to let Claude decide when to engage extended thinking, saving even more tokens on simple queries.
  • Monitor token usage—lower effort levels can significantly reduce your bill, especially for tool-heavy workflows where Claude makes fewer tool calls.

Limitations and Considerations

  • Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think on sufficiently difficult problems, but it will think less than at higher levels.
  • The xhigh level is currently only available on Claude Opus 4.7.
  • Effort affects all tokens, including tool calls. Lower effort means fewer tool calls, which may reduce the quality of multi-step agentic tasks.
  • Zero Data Retention (ZDR) is supported—data sent with effort is not stored after the API response is returned.

Key Takeaways

  • Effort lets you control token spend across text, tool calls, and thinking with a single parameter—no need to switch models.
  • Five levels (low, medium, high, xhigh, max) give you fine-grained control from fastest/cheapest to most thorough.
  • Combine with adaptive thinking for optimal efficiency—Claude decides when to think based on problem difficulty and your effort setting.
  • Always set effort explicitly with Sonnet 4.6 to avoid defaulting to high latency.
  • Start with medium for most use cases, and only go to max for the hardest problems or low for high-volume, simple tasks.