BeClaude
GuideBeginnerPricing2026-05-20

Mastering Claude's Effort Parameter: Balance Performance and Cost

Learn how to use Claude's effort parameter to control token spending, response thoroughness, and API costs. Includes code examples and best practices for all effort levels.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn how to set effort levels from 'low' to 'max' to balance speed, cost, and capability for different use cases.

effort parametertoken optimizationAPI best practicesClaude Sonnet 4.6cost management

Introduction

Claude's effort parameter is a powerful new tool that gives you fine-grained control over how many tokens your model uses when responding to requests. Whether you're building a high-volume chat application, a complex agentic system, or a cost-sensitive tool, understanding effort is key to getting the most out of Claude.

This guide covers everything you need to know: how effort works, the different levels available, practical code examples, and recommended configurations for common scenarios.

What Is the Effort Parameter?

The effort parameter controls how "eager" Claude is about spending tokens when generating responses. By default, Claude uses high effort, which means it will spend as many tokens as needed for excellent results. You can raise this to max for the absolute highest capability, or lower it to low for maximum speed and cost savings.

Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think deeply on difficult problems—just less than it would at higher levels.

Supported Models

The effort parameter is available on:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
  • Claude Opus 4.5
For Opus 4.6 and Sonnet 4.6, effort replaces the deprecated budget_tokens parameter as the recommended way to control thinking depth.

How Effort Affects Responses

The effort parameter influences all tokens in the response, including:

  • Text responses and explanations
  • Tool calls and function arguments
  • Extended thinking (when enabled)
This means lower effort can reduce the number of tool calls Claude makes, giving you much greater control over efficiency than previous approaches.

Effort Levels Explained

LevelDescriptionTypical Use Case
maxAbsolute maximum capability, no constraints on token spendingDeepest reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6)
xhighExtended capability for long-horizon workLong-running agentic/coding tasks over 30 minutes (Opus 4.7 only)
highHigh capability (default)Complex reasoning, difficult coding, agentic tasks
mediumBalanced approach with moderate token savingsAgentic tasks needing speed/cost/performance balance
lowMost efficient, significant token savingsSimple tasks, subagents, high-volume chat
Note: Setting effort to "high" produces exactly the same behavior as omitting the parameter entirely.

Recommended Effort for Sonnet 4.6

Sonnet 4.6 defaults to high effort. For most applications, you should explicitly set the effort level to avoid unexpected latency:

  • Medium effort (recommended default): Best balance of speed, cost, and performance for agentic coding, tool-heavy workflows, and code generation.
  • Low effort: For high-volume or latency-sensitive workloads like chat and non-coding use cases.

Code Examples

Python (using the Anthropic SDK)

import anthropic

client = anthropic.Anthropic()

Low effort for simple, fast responses

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, system="You are a helpful assistant.", messages=[{"role": "user", "content": "What is the capital of France?"}], # highlight-next-line effort="low" )

print(response.content[0].text)

# Medium effort for balanced performance
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system="You are a coding assistant.",
    messages=[{"role": "user", "content": "Write a Python function to sort a list of dictionaries by a key."}],
    # highlight-next-line
    effort="medium"
)
# Max effort for deep reasoning
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=8192,
    system="You are a research scientist.",
    messages=[{"role": "user", "content": "Analyze the implications of quantum computing on cryptography."}],
    # highlight-next-line
    effort="max"
)

TypeScript (using the Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

// Low effort for fast, simple responses const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 1024, system: 'You are a helpful assistant.', messages: [{ role: 'user', content: 'What is the capital of France?' }], // highlight-next-line effort: 'low' });

console.log(response.content[0].text);

// Medium effort for balanced performance
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  system: 'You are a coding assistant.',
  messages: [{ role: 'user', content: 'Write a Python function to sort a list of dictionaries by a key.' }],
  // highlight-next-line
  effort: 'medium'
});

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={"type": "adaptive"},  # Enable adaptive thinking
    messages=[{"role": "user", "content": "Solve this complex math problem..."}],
    effort="medium"
)

Adaptive thinking allows Claude to decide how much thinking to do based on the problem difficulty, while effort sets the overall ceiling.

Practical Use Cases

1. High-Volume Customer Support Chat

Use low effort for simple FAQ responses where speed is critical:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=512,
    messages=[{"role": "user", "content": "What are your business hours?"}],
    effort="low"
)

2. Agentic Coding Assistant

Use medium effort as your default for tool-heavy coding workflows:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[...],  # Your tool definitions
    messages=[{"role": "user", "content": "Refactor this module to use async/await."}],
    effort="medium"
)

3. Deep Research Analysis

Use max effort on Opus 4.7 for the most thorough reasoning:

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=16384,
    messages=[{"role": "user", "content": "Compare and contrast the economic policies of..."}],
    effort="max"
)

Best Practices

  • Always set effort explicitly with Sonnet 4.6 to avoid unexpected latency.
  • Start with medium for most agentic and coding tasks, then adjust based on observed performance.
  • Use low effort for subagents in multi-agent systems where each subagent handles simple, well-defined tasks.
  • Combine with adaptive thinking for optimal cost-performance tradeoffs.
  • Monitor token usage across effort levels to understand your cost profile.

Key Takeaways

  • The effort parameter controls token spending across all response types (text, tool calls, thinking).
  • Five levels are available: low, medium, high (default), xhigh (Opus 4.7 only), and max.
  • Lower effort reduces capability but improves speed and cost; higher effort does the opposite.
  • For Sonnet 4.6, use medium as your recommended default for most applications.
  • Combine effort with adaptive thinking for the best balance of performance and efficiency.
  • Effort replaces the deprecated budget_tokens parameter on Opus 4.6 and Sonnet 4.6.