GuideBeginnerPricing2026-05-12

Mastering Claude's Effort Parameter: Balance Performance and Cost

Learn how to control Claude's token spending with the effort parameter. Optimize for speed, cost, or deep reasoning across all API responses, including tool calls and extended thinking.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens. You'll learn how to set effort levels (low, medium, high, xhigh, max) to trade off between response thoroughness and efficiency, with practical code examples for the API.

effort parametertoken optimizationClaude APIcost managementextended thinking

Introduction

Claude is incredibly powerful, but with great power comes great token consumption. Every response, every tool call, every chain of thought costs tokens — and that translates to latency and expense. Enter the effort parameter: a single API setting that gives you fine-grained control over how "eager" Claude is about spending tokens.

Introduced for Claude Opus 4.6 and Sonnet 4.6, and now available across the latest models including Claude Mythos Preview and Claude Opus 4.7, effort replaces the older budget_tokens parameter as the recommended way to control thinking depth. It works with or without extended thinking enabled, and it affects all tokens — text, tool calls, and thinking blocks.

In this guide, you'll learn:

How the effort parameter works under the hood
When to use each effort level
How to implement effort in your API calls (Python and TypeScript)
Best practices for combining effort with adaptive thinking

How Effort Works

By default, Claude operates at high effort — spending as many tokens as needed for excellent results. The effort parameter lets you dial this up or down:

Higher effort → More tokens spent → Deeper reasoning, better quality, higher cost and latency
Lower effort → Fewer tokens spent → Faster responses, lower cost, some capability reduction

Crucially, effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think hard on sufficiently difficult problems — just less than it would at higher levels for the same problem.

What Effort Affects

Unlike some controls that only affect thinking tokens, effort influences all tokens in the response:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

This means lower effort can also reduce the number of tool calls Claude makes, giving you greater control over efficiency.

Effort Levels and When to Use Them

Level	Description	Best For
`low`	Most efficient. Significant token savings with some capability reduction.	Simple tasks, subagents, high-volume chat, latency-sensitive workloads
`medium`	Balanced approach with moderate token savings.	Agentic tasks needing speed/cost balance, tool-heavy workflows, code generation
`high`	High capability. Equivalent to omitting the parameter.	Complex reasoning, difficult coding, agentic tasks
`xhigh`	Extended capability for long-horizon work. (Opus 4.7 only)	Long-running agentic/coding tasks (30+ min) with million-token budgets
`max`	Absolute maximum capability with no constraints. (Mythos, Opus 4.6+, Sonnet 4.6)	Deepest possible reasoning, most thorough analysis

Note: Setting effort to "high" produces exactly the same behavior as omitting the parameter entirely.

Recommended Defaults for Sonnet 4.6

Sonnet 4.6 defaults to high effort. Anthropic recommends explicitly setting effort to avoid unexpected latency:

Medium effort — Best balance for most applications (agentic coding, tool-heavy workflows, code generation)
Low effort — For high-volume or latency-sensitive workloads (chat, non-coding use cases)

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This lets Claude dynamically decide when to think based on the problem complexity, while effort controls the overall token budget.

At high (default) and max effort, Claude will almost always think. At lower levels, it may skip thinking for simpler problems — saving tokens without sacrificing quality on hard tasks.

Code Examples

Python (using the Anthropic SDK)

import anthropic
client = anthropic.Anthropic()
Low effort — fast and cheap for simple tasks
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    # Highlight: effort parameter
    effort="low"
)
print(response.content[0].text)

# Medium effort — balanced for agentic workflows
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system="You are a coding assistant.",
    messages=[
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ],
    effort="medium"
)

# Max effort — for the hardest problems
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=8192,
    system="You are a research mathematician.",
    messages=[
        {"role": "user", "content": "Prove the Riemann Hypothesis."}
    ],
    effort="max",
    thinking={"type": "adaptive"}
)

TypeScript (using the Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Low effort for simple queries
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  system: 'You are a helpful assistant.',
  messages: [
    { role: 'user', content: 'What is 2 + 2?' }
  ],
  effort: 'low'
});
console.log(response.content[0].text);

// Medium effort with adaptive thinking
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  system: 'You are a coding assistant.',
  messages: [
    { role: 'user', content: 'Refactor this code for performance.' }
  ],
  effort: 'medium',
  thinking: { type: 'adaptive' }
});

Best Practices

Start with medium effort for most applications. It provides the best balance of speed, cost, and quality.
Use low effort for subagents or simple classification tasks where speed matters more than depth.
Reserve max effort for the hardest problems — complex math, deep research, or critical decision-making.
Combine with adaptive thinking to let Claude decide when to think, saving tokens on simple queries.
Monitor token usage across effort levels to find the sweet spot for your use case.

Migration from budget_tokens

If you're using budget_tokens with Opus 4.6 or Sonnet 4.6, switch to effort now. The budget_tokens parameter is deprecated and will be removed in a future model release.

Before:

thinking={"type": "enabled", "budget_tokens": 16000}

After:

effort="high",
thinking={"type": "adaptive"}

Key Takeaways

The effort parameter controls token spending across all response types (text, tool calls, thinking) — not just thinking tokens.
Five levels are available: low, medium, high, xhigh (Opus 4.7), and max (selected models).
Medium effort is the recommended default for Sonnet 4.6, balancing speed, cost, and performance.
Combine with adaptive thinking (thinking: {type: "adaptive"}) for the best experience — Claude thinks only when needed.
Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6 — migrate your code to avoid future breakage.