Guide2026-05-06

Mastering Claude's Effort Parameter: Balance Performance, Speed, and Cost

Learn how to use Claude's effort parameter to control token spending, optimize response thoroughness, and reduce costs across all supported models.

Quick Answer

The effort parameter lets you control how many tokens Claude spends on responses, from max (deepest reasoning) to low (fastest, cheapest). Use it to trade off capability for speed and cost without switching models.

effort parametertoken optimizationClaude APIcost controlextended thinking

Mastering Claude's Effort Parameter: Balance Performance, Speed, and Cost

Claude's effort parameter is a powerful new tool that gives you fine-grained control over how many tokens your model uses when responding to requests. Instead of switching between different Claude models to balance capability and cost, you can now adjust a single parameter to get exactly the behavior you need—from lightning-fast answers to deep, multi-step reasoning.

This guide explains everything you need to know about the effort parameter: how it works, when to use each level, and practical code examples to implement it today.

What Is the Effort Parameter?

The effort parameter (effort) controls how "eager" Claude is about spending tokens when generating responses. It affects all tokens in the output, including:

Text responses and explanations
Tool calls and function arguments
Extended thinking tokens (when enabled)

This is a major advantage over older approaches like budget_tokens, which only controlled thinking depth. The effort parameter gives you a single, unified dial for token efficiency across your entire application.

Important: For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. While budget_tokens is still accepted, it is deprecated and will be removed in a future model release.

Supported Models

The effort parameter is generally available on all supported models with no beta header required. Currently supported models include:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6
Claude Opus 4.5

Effort Levels Explained

Claude offers five effort levels, each suited to different use cases:

Level	Description	Typical Use Case
`max`	Absolute maximum capability, no token constraints	Deepest reasoning, most thorough analysis
`xhigh`	Extended capability for long-horizon work	Long-running agentic/coding tasks (30+ min)
`high`	High capability (default behavior)	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced approach with moderate token savings	Agentic tasks needing speed/cost balance
`low`	Most efficient, significant token savings	Simple tasks, subagents, high-volume chat

Important: Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think on sufficiently difficult problems—it will just think less than it would at higher levels for the same problem.

How to Use the Effort Parameter

Basic API Usage (Python)

import anthropic
client = anthropic.Anthropic()
Low effort: fastest, cheapest
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    effort="low",  # Options: "low", "medium", "high", "xhigh", "max"
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)
print(response.content[0].text)

TypeScript / Node.js Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
async function main() {
  const response = await client.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    effort: 'medium',
    messages: [
      { role: 'user', content: 'Write a Python script to scrape a website.' }
    ]
  });
  
  console.log(response.content[0].text);
}
main();

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking (thinking: {type: "adaptive"}). This allows Claude to dynamically decide when to use extended thinking based on the complexity of the task.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    effort="high",
    thinking={"type": "adaptive"},
    messages=[
        {"role": "user", "content": "Solve this complex math problem step by step."}
    ]
)

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency, always explicitly set the effort level when using this model:

Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.
High effort: For tasks requiring maximum capability, such as complex reasoning or difficult coding problems.

Practical Use Cases

1. Cost-Optimized Customer Support Chat

Use low effort for routine queries and medium for escalated issues:

def handle_support_request(query, is_complex=False):
    effort_level = "medium" if is_complex else "low"
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=512,
        effort=effort_level,
        messages=[{"role": "user", "content": query}]
    )
    return response.content[0].text

2. Multi-Agent Systems

Route sub-tasks to low-effort agents and complex reasoning to high-effort agents:

class Agent:
    def __init__(self, role, effort_level):
        self.role = role
        self.effort = effort_level
    
    def process(self, task):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2048,
            effort=self.effort,
            system=f"You are a {self.role} agent.",
            messages=[{"role": "user", "content": task}]
        )
        return response.content[0].text
Create agents with different effort levels
fast_agent = Agent("data collector", "low")
reasoning_agent = Agent("analyst", "high")
Use accordingly
raw_data = fast_agent.process("Fetch latest stock prices")
analysis = reasoning_agent.process(f"Analyze this data: {raw_data}")

3. Token Budget Management

Track token usage across effort levels to optimize costs:

def query_with_effort(prompt, effort_level):
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        effort=effort_level,
        messages=[{"role": "user", "content": prompt}]
    )
    
    usage = response.usage
    print(f"Effort: {effort_level}")
    print(f"Input tokens: {usage.input_tokens}")
    print(f"Output tokens: {usage.output_tokens}")
    
    return response.content[0].text

Best Practices

Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency.
Combine with adaptive thinking for optimal results—Claude will decide when to think deeply.
Start with medium for most applications, then adjust up or down based on observed performance.
Use low for high-volume, latency-sensitive tasks like chat or simple Q&A.
Reserve max and xhigh for tasks that genuinely require the deepest reasoning, such as complex code generation or multi-step analysis.

Limitations and Considerations

Effort is a behavioral signal, not a strict token budget. Claude may still use significant tokens for difficult problems even at low effort.
At max effort, token usage can be very high. Monitor your costs carefully.
The xhigh level is currently only available on Claude Opus 4.7.
When using low effort, expect some reduction in response quality for complex tasks.

Key Takeaways

The effort parameter gives you granular control over Claude's token spending, affecting all output types including text, tool calls, and thinking tokens.
Five effort levels let you trade off between capability and efficiency: low, medium, high, xhigh, and max.
Always set effort explicitly when using Sonnet 4.6 to avoid unexpected latency from the default high setting.
Combine effort with adaptive thinking for the best balance of performance and cost.
Use lower effort levels for simple tasks and higher levels for complex reasoning to optimize your token budget without switching models.