BeClaude
GuideBeginnerPricing2026-05-22

Mastering Claude’s Effort Parameter: Control Token Spend Without Sacrificing Intelligence

Learn how to use Claude's effort parameter to balance response thoroughness and token efficiency across all supported models, with practical API examples and recommended settings.

Quick Answer

This guide explains Claude’s effort parameter, which lets you control how eagerly Claude spends tokens on responses. You’ll learn the five effort levels (low, medium, high, xhigh, max), how to combine effort with adaptive thinking, and get recommended defaults for Sonnet 4.6 and Opus 4.7 to optimize speed, cost, and capability.

effort parametertoken efficiencyClaude APIextended thinkingcost optimization

Introduction

If you’ve ever wished you could dial Claude’s thoroughness up or down—spending fewer tokens on simple tasks while reserving maximum reasoning for the hardest problems—the effort parameter is exactly what you need. Introduced across Claude’s latest models, effort gives you fine-grained control over how many tokens Claude invests in each response, without requiring extended thinking mode to be enabled.

This guide covers everything you need to know to start using effort effectively: how it works, the five effort levels, recommended defaults for different models, and practical API examples in Python and TypeScript.

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how eager it should be about spending tokens when responding to requests. It affects all tokens in the response—including text explanations, tool calls, function arguments, and extended thinking (when enabled).

Key advantages:

  • No need to enable extended thinking – effort works independently.
  • Controls tool call volume – lower effort means fewer tool calls, giving you greater efficiency gains.
  • Single model, multiple behaviors – you can switch between low, medium, high, and max effort without changing models.
Important: For Claude Opus 4.6 and Sonnet 4.6, effort replaces the deprecated budget_tokens parameter. While budget_tokens is still accepted, it will be removed in a future release.

How Effort Levels Work

There are five effort levels, each suited to different use cases:

LevelDescriptionTypical Use Case
lowMost efficient. Significant token savings with some capability reduction.Simple tasks needing best speed and lowest cost (e.g., subagents, chat).
mediumBalanced approach with moderate token savings.Agentic tasks requiring a balance of speed, cost, and performance.
highHigh capability. Equivalent to omitting the parameter.Complex reasoning, difficult coding problems, agentic tasks.
xhighExtended capability for long-horizon work.Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions.
maxAbsolute maximum capability with no constraints on token spending.Tasks requiring the deepest possible reasoning and most thorough analysis.
Note: xhigh is available only on Claude Opus 4.7. max is available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6.

Recommended Defaults for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency and costs, Anthropic recommends explicitly setting effort:

  • Medium (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
  • Low: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed is critical.

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). Adaptive thinking lets Claude decide dynamically how much thinking to apply based on the problem, while effort sets the overall budget envelope.

At high (default) and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems, saving tokens without sacrificing quality on easy tasks.

Practical API Examples

Python (using the Anthropic SDK)

import anthropic

client = anthropic.Anthropic()

Low effort – fast, cheap responses for simple tasks

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, effort="low", messages=[ {"role": "user", "content": "What is the capital of France?"} ] ) print(response.content[0].text)

High effort – thorough reasoning for complex problems

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, effort="high", messages=[ {"role": "user", "content": "Explain the implications of quantum entanglement on modern cryptography."} ] ) print(response.content[0].text)

Max effort – absolute maximum capability

response = client.messages.create( model="claude-opus-4-20250514", max_tokens=8192, effort="max", messages=[ {"role": "user", "content": "Design a novel algorithm for distributed consensus that tolerates Byzantine faults with minimal latency."} ] ) print(response.content[0].text)

TypeScript (using the Anthropic SDK)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

// Medium effort – balanced for agentic workflows const response = await client.messages.create({ model: "claude-sonnet-4-20250514", max_tokens: 2048, effort: "medium", messages: [ { role: "user", content: "Write a Python script to scrape a website and extract all image URLs." } ] }); console.log(response.content[0].text);

// Combining effort with adaptive thinking const thinkingResponse = await client.messages.create({ model: "claude-opus-4-20250514", max_tokens: 4096, effort: "high", thinking: { type: "adaptive" }, messages: [ { role: "user", content: "Solve this complex math problem step by step: ∫(x^2 * e^x) dx" } ] }); console.log(thinkingResponse.content[0].text);

Using effort with tool calls

Effort also controls how many tool calls Claude makes. Lower effort means fewer tool calls, which can significantly reduce latency and cost in agentic workflows.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=2048, effort="low", # Fewer tool calls, faster responses tools=[ { "name": "search_web", "description": "Search the web for information", "input_schema": { "type": "object", "properties": { "query": {"type": "string"} }, "required": ["query"] } } ], messages=[ {"role": "user", "content": "Find the latest news about AI regulation."} ] ) print(response.content)

Best Practices

  • Start with medium for Sonnet 4.6 – It provides the best balance for most applications. Only increase to high or max when you need deeper reasoning.
  • Use low for high-volume subagents – If you’re running many parallel agents (e.g., for data extraction or classification), low effort saves tokens without significant quality loss.
  • Combine with adaptive thinking – For models that support it, thinking: {type: "adaptive"} lets Claude decide when to think, while effort sets the overall budget.
  • Test with your specific workload – Effort is a behavioral signal, not a strict budget. Run A/B tests to find the optimal level for your use case.
  • Monitor token usage – Use the API’s usage statistics to compare token spend across effort levels and adjust accordingly.

Limitations and Considerations

  • Not a strict token budget – Claude may still think deeply on hard problems even at low effort. The parameter is a signal, not a hard cap.
  • Model availabilityxhigh is only available on Claude Opus 4.7. max is available on Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.
  • Deprecation of budget_tokens – If you’re using budget_tokens on Opus 4.6 or Sonnet 4.6, migrate to effort as soon as possible.

Key Takeaways

  • The effort parameter lets you control token spend across all response types – including text, tool calls, and extended thinking – without needing thinking mode enabled.
  • Five levels (low, medium, high, xhigh, max) give you fine-grained control, with medium recommended as the default for Sonnet 4.6.
  • Combine effort with adaptive thinking for the best balance of capability and efficiency.
  • Lower effort reduces tool call volume, making it ideal for high-throughput agentic workflows.
  • Always test with your specific workload to find the optimal effort level for your application’s speed, cost, and quality requirements.