GuideBeginnerPricing2026-05-12

Mastering Claude’s Effort Parameter: Smarter Token Control for Cost and Performance

Learn how to use Claude's effort parameter to control thinking depth, reduce token spend, and balance speed vs. capability across all API models.

Quick Answer

This guide explains Claude's effort parameter—a behavioral signal that controls how eagerly Claude spends tokens. You'll learn how to set effort levels (low, medium, high, xhigh, max) to trade off between response thoroughness and efficiency, with practical code examples for Python and TypeScript.

effort parametertoken optimizationAPI best practicesClaude Sonnet 4.6cost efficiency

Mastering Claude’s Effort Parameter: Smarter Token Control for Cost and Performance

Claude is incredibly capable, but that capability comes at a cost—literally. Every token Claude generates, whether in reasoning, tool calls, or final text, adds to your API bill. Until recently, developers had limited control over this spending. You could set a budget_tokens ceiling for extended thinking, but that only applied to thinking tokens, not to the full response.

Enter the effort parameter. Introduced across Claude’s latest models, effort gives you a single, elegant dial to turn up or down how much Claude “tries” on a request. It affects every token in the response—text, tool calls, and extended thinking—making it the most powerful tool yet for balancing speed, cost, and quality.

In this guide, you’ll learn exactly how effort works, when to use each level, and how to implement it in your API calls.

What Is the Effort Parameter?

The effort parameter is a behavioral signal that tells Claude how eagerly it should spend tokens when responding. It’s not a hard budget—Claude can still think deeply on hard problems at low effort—but it strongly influences how much reasoning, how many tool calls, and how much elaboration Claude produces.

Key facts:

Available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Opus 4.5.
No beta header required.
Replaces budget_tokens as the recommended way to control thinking depth on Opus 4.6 and Sonnet 4.6.
Works with or without extended thinking enabled.
Affects all tokens: text, tool calls, and thinking.

Effort Levels and When to Use Them

There are five effort levels, each suited to different use cases.

Level	Description	Best For
`low`	Most efficient. Significant token savings with some capability reduction.	Simple tasks, high-volume chat, subagents, latency-sensitive workloads.
`medium`	Balanced approach with moderate token savings.	Agentic tasks needing a balance of speed, cost, and performance. Recommended default for Sonnet 4.6.
`high`	High capability. Equivalent to omitting the parameter entirely.	Complex reasoning, difficult coding, agentic tasks. Default behavior.
`xhigh`	Extended capability for long-horizon work. Available on Opus 4.7.	Long-running agentic and coding tasks (30+ minutes) with token budgets in the millions.
`max`	Absolute maximum capability with no constraints.	Tasks requiring the deepest possible reasoning and most thorough analysis.

Note: xhigh is only available on Claude Opus 4.7. max is available on Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6.

How Effort Interacts with Thinking

If you enable extended thinking (via thinking: {type: "enabled", budget_tokens: N}), effort still controls the overall response. But there’s a better approach: adaptive thinking.

Adaptive thinking (thinking: {type: "adaptive"}) lets Claude decide how much thinking is needed based on the effort level you set. This combination gives you the best of both worlds—deep thinking when needed, token savings when not.

# Recommended: effort + adaptive thinking
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={"type": "adaptive"},
    effort="medium",  # or "low", "high", "max"
    messages=[{"role": "user", "content": "Explain quantum entanglement in simple terms."}]
)

Practical Code Examples

Python: Setting Effort in a Basic Request

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    effort="low",  # Fast and cheap
    messages=[
        {"role": "user", "content": "Summarize this article in three bullet points."}
    ]
)
print(response.content[0].text)

TypeScript: Effort with Tool Calls

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 8192,
  effort: 'medium',
  tools: [{
    name: 'get_weather',
    description: 'Get the current weather for a location',
    input_schema: {
      type: 'object',
      properties: {
        location: { type: 'string' }
      },
      required: ['location']
    }
  }],
  messages: [
    { role: 'user', content: 'What\'s the weather in Tokyo and Paris?' }
  ]
});
console.log(response.content);

At low effort, Claude might make fewer tool calls or skip unnecessary ones. At high, it might call tools more liberally to gather comprehensive data.

Python: Effort with Adaptive Thinking

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16384,
    thinking={"type": "adaptive"},
    effort="max",  # Deepest reasoning
    messages=[
        {"role": "user", "content": "Design a distributed caching system for a global e-commerce platform. Consider consistency, partition tolerance, and latency."}
    ]
)

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort, which can lead to unexpectedly long responses. Anthropic recommends explicitly setting effort to avoid surprises:

Medium (recommended default): Best balance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.

# Sonnet 4.6 with explicit medium effort
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    effort="medium",
    messages=[{"role": "user", "content": "Write a Python function to merge two sorted lists."}]
)

Effort vs. budget_tokens: What’s the Difference?

If you’ve used budget_tokens before, you might wonder how effort compares.

Aspect	budget_tokens	effort
Scope	Thinking tokens only	All tokens (text, tools, thinking)
Precision	Hard token limit	Behavioral signal
Flexibility	Fixed cap	Adaptive to problem difficulty
Deprecation	Deprecated on Opus 4.6 and Sonnet 4.6	Current and recommended

Effort is more flexible because it doesn’t impose a hard ceiling. Instead, it guides Claude’s behavior: at low effort, Claude will still think hard on genuinely difficult problems, but it won’t waste tokens on trivial ones.

Best Practices

Start with medium for most tasks. It provides a good balance and is the recommended default for Sonnet 4.6.
Use low for high-throughput subagents. If you have many parallel agents doing simple lookups or classifications, low effort saves tokens and reduces latency.
Reserve max for complex reasoning. Use it only when you need Claude’s absolute best—like solving novel math problems or debugging intricate code.
Combine with adaptive thinking. This pairing gives Claude the most natural control over token spend.
Test on representative samples. Effort is behavioral, so its impact varies by task. Run A/B tests with different levels to find your sweet spot.

Key Takeaways

Effort is a behavioral signal that controls how eagerly Claude spends tokens across text, tool calls, and thinking—not a hard budget.
Five levels are available: low, medium, high, xhigh (Opus 4.7 only), and max.
Medium is the recommended default for Sonnet 4.6 to avoid unexpected latency and cost.
Combine effort with adaptive thinking (thinking: {type: "adaptive"}) for the best balance of depth and efficiency.
Effort replaces budget_tokens on Opus 4.6 and Sonnet 4.6, which is now deprecated.