Guide2026-05-05

Mastering Claude's Effort Parameter: Control Token Spend and Response Depth

Learn how to use Claude's effort parameter to control token usage, response thoroughness, and cost. Includes code examples, effort levels, and best practices for Sonnet 4.6 and Opus 4.6.

Quick Answer

Claude's effort parameter lets you control how eagerly Claude spends tokens on responses, from max (deepest reasoning) to low (fastest, cheapest). It works with or without extended thinking and replaces budget_tokens on Opus 4.6 and Sonnet 4.6.

effort parametertoken efficiencyClaude APIcost optimizationextended thinking

Mastering Claude's Effort Parameter: Control Token Spend and Response Depth

If you've ever wished you could dial Claude's thinking up or down depending on the task, your wish has been granted. The effort parameter gives you fine-grained control over how many tokens Claude spends on a response—without switching models. This guide explains everything you need to know to use effort effectively, with practical code examples and best practices.

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how eager it should be about spending tokens when responding. By default, Claude uses high effort, spending as many tokens as needed for excellent results. You can raise it to max for the absolute highest capability, or lower it to low for maximum speed and cost savings.

Key advantages of the effort parameter:

No thinking required – Works with or without extended thinking enabled
Affects all tokens – Controls text, tool calls, and thinking tokens
Single model – No need to switch between different Claude models for different depth levels

Supported Models

The effort parameter is generally available on:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6
Claude Opus 4.5

For Opus 4.6 and Sonnet 4.6, effort replaces the deprecated budget_tokens parameter as the recommended way to control thinking depth.

Effort Levels Explained

Level	Description	Best Use Case
`max`	Absolute maximum capability, no token constraints	Deepest reasoning, complex research, multi-step analysis
`xhigh`	Extended capability for long-horizon work (Opus 4.7 only)	Long-running agentic/coding tasks over 30 minutes
`high`	High capability (default)	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced approach with moderate token savings	Agentic tasks needing speed/cost balance
`low`	Most efficient, significant token savings	Simple tasks, subagents, high-volume chat

Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think on sufficiently difficult problems—just less than at higher levels.

How to Use the Effort Parameter

Basic Usage (Python SDK)

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    effort="low",  # Options: low, medium, high, max
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)
print(response.content[0].text)

With Extended Thinking

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},  # Adaptive thinking pairs well with effort
    effort="medium",
    messages=[
        {"role": "user", "content": "Design a distributed caching system."}
    ]
)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 8192,
  effort: 'high',
  messages: [
    { role: 'user', content: 'Write a Python script to analyze CSV data.' }
  ]
});
console.log(response.content[0].text);

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort, which may introduce unexpected latency. Anthropic recommends explicitly setting effort:

Medium (recommended default) – Best balance for most applications: agentic coding, tool-heavy workflows, code generation
Low – For high-volume or latency-sensitive workloads: chat, non-coding use cases
High – For tasks requiring maximum capability from Sonnet

Practical Scenarios

Scenario 1: Cost-Sensitive Subagents

If you're building a multi-agent system where subagents handle simple classification or extraction tasks, use low effort to minimize token spend:

def classify_document(text):
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        effort="low",
        messages=[
            {"role": "user", "content": f"Classify this document as 'urgent', 'normal', or 'low': {text}"}
        ]
    )
    return response.content[0].text

Scenario 2: Deep Research Tasks

For complex research or multi-step reasoning, use max effort with adaptive thinking:

def deep_research(query):
    response = client.messages.create(
        model="claude-opus-4-20250514",
        max_tokens=64000,
        thinking={"type": "adaptive"},
        effort="max",
        messages=[
            {"role": "user", "content": f"Conduct a thorough analysis of: {query}"}
        ]
    )
    return response.content[0].text

Scenario 3: Balanced Agentic Workflows

For a coding agent that needs both speed and quality, use medium effort:

def code_review_agent(code_snippet):
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        effort="medium",
        tools=[
            {
                "name": "suggest_improvements",
                "description": "Suggest code improvements",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "suggestions": {"type": "array", "items": {"type": "string"}}
                    }
                }
            }
        ],
        messages=[
            {"role": "user", "content": f"Review this code: {code_snippet}"}
        ]
    )
    return response

Effort vs. budget_tokens

If you're migrating from budget_tokens on Opus 4.6 or Sonnet 4.6, here's what changed:

Feature	budget_tokens (deprecated)	effort (recommended)
Scope	Thinking tokens only	All tokens (text, tools, thinking)
Precision	Exact token budget	Behavioral signal
Flexibility	Requires thinking enabled	Works without thinking
Future-proof	Will be removed	Actively supported

Best Practices

Start with medium – For most applications, medium offers the best balance of speed, cost, and quality
Combine with adaptive thinking – thinking: {type: "adaptive"} pairs naturally with effort levels
Test with your workload – Run A/B tests to find the optimal effort level for your specific use case
Use low for subagents – Simple classification, extraction, or routing tasks don't need high effort
Reserve max for complex tasks – Only use max when you truly need the deepest reasoning

Common Pitfalls

Expecting strict budgets – Effort is a signal, not a hard limit. Claude may still spend significant tokens on genuinely hard problems at low effort.
Ignoring Sonnet defaults – Sonnet 4.6 defaults to high effort. Always set it explicitly to avoid unexpected latency.
Using max unnecessarily – max effort can dramatically increase token usage. Only use it when the task genuinely requires maximum capability.

Key Takeaways

Effort controls token spend across text, thinking, and tool calls—not just thinking tokens like the deprecated budget_tokens
Five levels available: low, medium, high (default), xhigh (Opus 4.7 only), and max
Works without extended thinking enabled, making it universally applicable
Sonnet 4.6 users should explicitly set effort to avoid unexpected latency from the high default
Combine with adaptive thinking for the best balance of depth and efficiency on complex tasks