GuideBeginnerPricing2026-05-20

Mastering Claude's Effort Parameter: Balance Performance and Cost

Learn how to use Claude's effort parameter to control token spending, response thoroughness, and API costs. Includes code examples and best practices for all effort levels.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn how to set effort levels from 'low' to 'max' to balance speed, cost, and capability for different use cases.

effort parametertoken optimizationAPI best practicesClaude Sonnet 4.6cost management

Introduction

Claude's effort parameter is a powerful new tool that gives you fine-grained control over how many tokens your model uses when responding to requests. Whether you're building a high-volume chat application, a complex agentic system, or a cost-sensitive tool, understanding effort is key to getting the most out of Claude.

This guide covers everything you need to know: how effort works, the different levels available, practical code examples, and recommended configurations for common scenarios.

What Is the Effort Parameter?

The effort parameter controls how "eager" Claude is about spending tokens when generating responses. By default, Claude uses high effort, which means it will spend as many tokens as needed for excellent results. You can raise this to max for the absolute highest capability, or lower it to low for maximum speed and cost savings.

Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think deeply on difficult problems—just less than it would at higher levels.

Supported Models

The effort parameter is available on:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6
Claude Opus 4.5

For Opus 4.6 and Sonnet 4.6, effort replaces the deprecated budget_tokens parameter as the recommended way to control thinking depth.

How Effort Affects Responses

The effort parameter influences all tokens in the response, including:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

This means lower effort can reduce the number of tool calls Claude makes, giving you much greater control over efficiency than previous approaches.

Effort Levels Explained

Level	Description	Typical Use Case
`max`	Absolute maximum capability, no constraints on token spending	Deepest reasoning, most thorough analysis (Mythos, Opus 4.7, Opus 4.6, Sonnet 4.6)
`xhigh`	Extended capability for long-horizon work	Long-running agentic/coding tasks over 30 minutes (Opus 4.7 only)
`high`	High capability (default)	Complex reasoning, difficult coding, agentic tasks
`medium`	Balanced approach with moderate token savings	Agentic tasks needing speed/cost/performance balance
`low`	Most efficient, significant token savings	Simple tasks, subagents, high-volume chat

Note: Setting effort to "high" produces exactly the same behavior as omitting the parameter entirely.

Recommended Effort for Sonnet 4.6

Sonnet 4.6 defaults to high effort. For most applications, you should explicitly set the effort level to avoid unexpected latency:

Medium effort (recommended default): Best balance of speed, cost, and performance for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads like chat and non-coding use cases.

Code Examples

Python (using the Anthropic SDK)

import anthropic
client = anthropic.Anthropic()
Low effort for simple, fast responses
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    # highlight-next-line
    effort="low"
)
print(response.content[0].text)

# Medium effort for balanced performance
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system="You are a coding assistant.",
    messages=[{"role": "user", "content": "Write a Python function to sort a list of dictionaries by a key."}],
    # highlight-next-line
    effort="medium"
)

# Max effort for deep reasoning
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=8192,
    system="You are a research scientist.",
    messages=[{"role": "user", "content": "Analyze the implications of quantum computing on cryptography."}],
    # highlight-next-line
    effort="max"
)

TypeScript (using the Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Low effort for fast, simple responses
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  system: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'What is the capital of France?' }],
  // highlight-next-line
  effort: 'low'
});
console.log(response.content[0].text);

// Medium effort for balanced performance
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  system: 'You are a coding assistant.',
  messages: [{ role: 'user', content: 'Write a Python function to sort a list of dictionaries by a key.' }],
  // highlight-next-line
  effort: 'medium'
});

Combining Effort with Adaptive Thinking

For the best experience, combine effort with adaptive thinking:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={"type": "adaptive"},  # Enable adaptive thinking
    messages=[{"role": "user", "content": "Solve this complex math problem..."}],
    effort="medium"
)

Adaptive thinking allows Claude to decide how much thinking to do based on the problem difficulty, while effort sets the overall ceiling.

Practical Use Cases

1. High-Volume Customer Support Chat

Use low effort for simple FAQ responses where speed is critical:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=512,
    messages=[{"role": "user", "content": "What are your business hours?"}],
    effort="low"
)

2. Agentic Coding Assistant

Use medium effort as your default for tool-heavy coding workflows:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[...],  # Your tool definitions
    messages=[{"role": "user", "content": "Refactor this module to use async/await."}],
    effort="medium"
)

3. Deep Research Analysis

Use max effort on Opus 4.7 for the most thorough reasoning:

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=16384,
    messages=[{"role": "user", "content": "Compare and contrast the economic policies of..."}],
    effort="max"
)

Best Practices

Always set effort explicitly with Sonnet 4.6 to avoid unexpected latency.
Start with medium for most agentic and coding tasks, then adjust based on observed performance.
Use low effort for subagents in multi-agent systems where each subagent handles simple, well-defined tasks.
Combine with adaptive thinking for optimal cost-performance tradeoffs.
Monitor token usage across effort levels to understand your cost profile.

Key Takeaways

The effort parameter controls token spending across all response types (text, tool calls, thinking).
Five levels are available: low, medium, high (default), xhigh (Opus 4.7 only), and max.
Lower effort reduces capability but improves speed and cost; higher effort does the opposite.
For Sonnet 4.6, use medium as your recommended default for most applications.
Combine effort with adaptive thinking for the best balance of performance and efficiency.
Effort replaces the deprecated budget_tokens parameter on Opus 4.6 and Sonnet 4.6.