BeClaude
GuideBeginnerPricing2026-05-22

Mastering Claude’s Effort Parameter: Control Thinking Depth for Speed & Cost

Learn how to use Claude's effort parameter to balance response thoroughness, token efficiency, and latency. Includes code examples, effort levels, and best practices for Opus 4.6, Sonnet 4.6, and Mythos Preview.

Quick Answer

The effort parameter lets you control how eagerly Claude spends tokens on a response. Set it to 'low' for fast, cheap answers on simple tasks, or 'max' for the deepest reasoning on complex problems. It works across all response tokens, including tool calls and thinking.

effort parameterextended thinkingtoken efficiencyClaude APIcost optimization

Introduction

Claude’s effort parameter is a powerful new way to fine-tune the trade-off between response quality and token efficiency. Instead of switching between different models or manually setting token budgets, you can now tell a single Claude model how hard it should think — from a quick, cost-effective answer to a deep, multi-step reasoning marathon.

This guide explains how effort works, when to use each level, and how to integrate it into your API calls. By the end, you’ll know exactly how to dial in the right balance for your use case.

How Effort Works

The effort parameter is a behavioral signal, not a strict token budget. When you set effort to high (the default), Claude spends as many tokens as needed for excellent results. Lower levels tell Claude to be more conservative — it may skip thinking for simple problems, but it will still think deeply when the problem truly requires it.

Effort affects all tokens in the response:

  • Text responses and explanations
  • Tool calls and function arguments
  • Extended thinking (when enabled)
This is a major advantage over older approaches like budget_tokens, which only controlled thinking tokens. With effort, you get a unified control that also reduces the number of tool calls Claude makes at lower levels.

Note: For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. budget_tokens is deprecated and will be removed in a future model release.

Effort Levels

LevelDescriptionBest For
maxAbsolute maximum capability, no constraints on token spendingDeepest reasoning, most thorough analysis (Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6)
xhighExtended capability for long-horizon workLong-running agentic and coding tasks (>30 min) with token budgets in the millions (Opus 4.7 only)
highHigh capability (default)Complex reasoning, difficult coding, agentic tasks
mediumBalanced approach with moderate token savingsAgentic tasks needing a balance of speed, cost, and performance
lowMost efficient, significant token savingsSimple tasks, subagents, high-volume or latency-sensitive workloads

Recommended Default for Sonnet 4.6

Sonnet 4.6 defaults to high effort. If you don’t set it explicitly, you may get unexpected latency. The recommended defaults are:

  • Medium effort — Best balance of speed, cost, and performance for most applications (agentic coding, tool-heavy workflows, code generation).
  • Low effort — For high-volume or latency-sensitive workloads (chat, non-coding use cases).

Using Effort in the API

Effort is available on all supported models with no beta header required. You can set it directly in the request body.

Python Example

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, effort="medium", # or "low", "high", "max" messages=[ {"role": "user", "content": "Write a Python function to merge two sorted lists."} ] )

print(response.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 1024, effort: 'medium', messages: [ { role: 'user', content: 'Write a Python function to merge two sorted lists.' } ] });

console.log(response.content[0].text);

Combining with Adaptive Thinking

For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This lets Claude dynamically decide how much thinking to use based on the problem complexity, while still respecting your effort signal.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={"type": "adaptive"},
    effort="high",
    messages=[
        {"role": "user", "content": "Explain the implications of quantum computing on cryptography."}
    ]
)

When to Use Each Effort Level

Low Effort

  • Use for: Simple Q&A, classification, summarization, chat, subagents that don’t need deep reasoning.
  • Benefits: Fastest response times, lowest token cost.
  • Trade-off: Reduced capability on complex problems. Claude may skip thinking entirely for easy tasks.

Medium Effort

  • Use for: Agentic coding, tool-heavy workflows, code generation, multi-step tasks that need some reasoning but not maximum depth.
  • Benefits: Good balance of speed, cost, and performance. Recommended default for Sonnet 4.6.
  • Trade-off: Slightly less thorough than high on very difficult problems.

High Effort (Default)

  • Use for: Complex reasoning, difficult coding problems, agentic tasks where quality is paramount.
  • Benefits: Excellent results, Claude spends as many tokens as needed.
  • Trade-off: Higher latency and cost compared to lower levels.

Max Effort

  • Use for: The absolute hardest problems — mathematical proofs, deep scientific analysis, multi-hour agentic tasks.
  • Benefits: No constraints on token spending, maximum capability.
  • Trade-off: Highest cost and latency. Available only on Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.

XHigh Effort (Opus 4.7 only)

  • Use for: Long-running agentic and coding tasks lasting over 30 minutes, with token budgets in the millions.
  • Benefits: Extended capability for sustained, complex workflows.
  • Trade-off: Very high token consumption.

Practical Tips

  • Start with medium for Sonnet 4.6 — It’s the best default for most applications. Explicitly set effort to avoid unexpected latency.
  • Use low for high-volume pipelines — If you’re processing thousands of simple requests (e.g., classification, extraction), low effort can dramatically reduce costs.
  • Reserve max for the hardest 5% of problems — Don’t use it for everyday tasks. The token cost can be 10x or more compared to medium.
  • Combine with adaptive thinking — This gives Claude the flexibility to think only when necessary, while still respecting your effort level.
  • Monitor token usage — Effort is a behavioral signal, not a hard budget. Always track your actual token spend and adjust accordingly.

Key Takeaways

  • Effort replaces budget_tokens for Opus 4.6 and Sonnet 4.6. It controls all response tokens, not just thinking.
  • Five levelslow, medium, high (default), xhigh (Opus 4.7 only), and max — let you dial in the perfect balance of speed, cost, and capability.
  • No beta header required — Effort works on all supported models out of the box.
  • Combine with adaptive thinking for the best experience, especially on mixed workloads.
  • Explicitly set effort for Sonnet 4.6 to avoid unexpected latency; medium is the recommended default for most applications.