GuideBeginnerPricing2026-05-21

Mastering Claude's Effort Parameter: Optimize Token Spend and Response Quality

Learn how to use Claude's effort parameter to control token spending, balance speed and capability, and optimize costs across all API responses including thinking and tool calls.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn the five effort levels (low, medium, high, xhigh, max), how to set them in API calls, and practical strategies for balancing response quality, speed, and cost.

effort parametertoken optimizationAPI best practicesClaude Sonnet 4.6cost management

Introduction

Every Claude API call is a trade-off between capability and cost. Do you want the deepest possible reasoning, or do you need a fast, cheap answer? Until recently, developers had to choose between models or fiddle with budget_tokens to control thinking depth. The new effort parameter changes everything.

Effort gives you a single, intuitive dial to control how eagerly Claude spends tokens on your requests. Set it to max for the absolute best reasoning, or dial it down to low for high-speed, low-cost responses. And the best part? It works across all response types—text, tool calls, and extended thinking—without requiring thinking to be enabled.

In this guide, you'll learn exactly what the effort parameter does, how to use it in your API calls, and practical strategies for choosing the right level for your use case.

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how much token spend is appropriate for a given request. It's not a hard budget—Claude will still think deeply on difficult problems even at low effort—but it strongly influences how thorough the response is.

Key benefits:

Works on all supported models (Claude Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6, Opus 4.5)
No beta header required
Affects all tokens: text, tool calls, and extended thinking
Replaces budget_tokens as the recommended way to control thinking depth on Opus 4.6 and Sonnet 4.6
Combines seamlessly with adaptive thinking (thinking: {type: "adaptive"})

Effort Levels Explained

There are five effort levels, each suited to different use cases:

Level	Description	Typical Use Case
`low`	Most efficient. Significant token savings with some capability reduction.	Simple tasks, high-volume chat, subagents, latency-sensitive workloads
`medium`	Balanced approach with moderate token savings.	Agentic tasks needing speed/cost balance, tool-heavy workflows, code generation
`high`	High capability. Equivalent to not setting the parameter.	Complex reasoning, difficult coding, agentic tasks (default behavior)
`xhigh`	Extended capability for long-horizon work. Available on Opus 4.7 only.	Long-running agentic/coding tasks (30+ minutes) with million-token budgets
`max`	Absolute maximum capability with no constraints. Available on Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6.	Deepest possible reasoning, most thorough analysis

Important: Setting effort to "high" produces exactly the same behavior as omitting the parameter entirely. If you want the default, you can leave it out.

How to Use Effort in Your API Calls

Using the effort parameter is straightforward. Add it to the thinking configuration block in your Messages API request.

Python Example

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048,
        "effort": "medium"  # <-- Set effort here
    },
    messages=[
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ]
)
print(response.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  thinking: {
    type: 'enabled',
    budget_tokens: 2048,
    effort: 'medium'
  },
  messages: [
    { role: 'user', content: 'Write a Python function to merge two sorted lists.' }
  ]
});
console.log(response.content[0].text);

Using Effort Without Thinking

One of the biggest advantages of the effort parameter is that it works even when thinking is not enabled. This means you can control token spend on tool calls and regular text responses too.

# Effort works without thinking enabled
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    thinking={
        "type": "disabled",
        "effort": "low"  # Still controls token spend on text/tool calls
    },
    messages=[
        {"role": "user", "content": "What's the capital of France?"}
    ]
)

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort, which can lead to unexpected latency if you're not careful. Anthropic recommends explicitly setting effort for predictable behavior:

Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed matters more than depth.

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking. Adaptive thinking lets Claude dynamically decide how much to think based on the problem difficulty, while effort sets the overall token-spending posture.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={
        "type": "adaptive",
        "effort": "medium"
    },
    messages=[
        {"role": "user", "content": "Design a distributed rate limiter."}
    ]
)

Practical Strategies for Choosing Effort Levels

1. Match Effort to Task Complexity

Simple Q&A or data extraction: Use low effort. You'll get fast, cheap answers with minimal thinking overhead.
Agentic workflows with tool calls: Use medium effort. This reduces unnecessary tool calls while maintaining good reasoning.
Complex coding or analysis: Use high effort (or omit the parameter). You want Claude to think deeply.
Long-running agents (30+ min): Use xhigh effort on Opus 4.7 for million-token budgets.
Mission-critical reasoning: Use max effort when you need the absolute best answer regardless of cost.

2. Start High, Then Dial Down

When building a new application, start with high effort to establish a quality baseline. Then gradually reduce effort while monitoring output quality. You'll often find that medium effort delivers 90% of the quality at 50% of the cost.

3. Use Effort to Control Tool Call Frequency

Lower effort doesn't just reduce thinking—it also reduces the number of tool calls Claude makes. This is a powerful lever for cost control in agentic systems where tool calls can be expensive.

4. Combine with Prompt Caching for Maximum Efficiency

Effort and prompt caching work together beautifully. Use low or medium effort on cached prompts to get fast, cheap responses for common queries.

Migration from budget_tokens

If you're currently using budget_tokens on Opus 4.6 or Sonnet 4.6, you should migrate to the effort parameter. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Before (deprecated):

thinking={
    "type": "enabled",
    "budget_tokens": 2048
}

After (recommended):

thinking={
    "type": "enabled",
    "budget_tokens": 2048,
    "effort": "medium"
}

Important Caveats

Effort is a behavioral signal, not a strict budget. At lower effort levels, Claude will still think deeply on sufficiently difficult problems—just less than it would at higher effort.
At high and max effort, Claude will almost always think. At lower levels, it may skip thinking entirely for simple problems.
Not all effort levels are available on all models. xhigh is Opus 4.7 only. max requires Mythos Preview, Opus 4.7, Opus 4.6, or Sonnet 4.6.
Zero Data Retention (ZDR) eligible: When your organization has a ZDR arrangement, data sent with effort parameters is not stored after the API response is returned.

Key Takeaways

The effort parameter gives you fine-grained control over token spend across text, tool calls, and extended thinking, without requiring thinking to be enabled.
Five effort levels (low, medium, high, xhigh, max) let you trade off between speed/cost and capability, with high being the default behavior.
For Sonnet 4.6, explicitly set effort to avoid unexpected latency—use medium as a recommended default for most applications.
Combine effort with adaptive thinking for the best balance of dynamic depth and cost control.
Migrate from budget_tokens to effort on Opus 4.6 and Sonnet 4.6, as budget_tokens is deprecated and will be removed in a future release.