GuideBeginnerPricing2026-05-21

Mastering Claude's Effort Parameter: Control Thinking Depth and Token Spend

Learn how to use Claude's effort parameter to balance response thoroughness, speed, and cost. Includes code examples, effort levels, and best practices for Opus and Sonnet models.

Quick Answer

This guide explains Claude's effort parameter, which lets you control how eagerly Claude spends tokens on responses. You'll learn the five effort levels (low, medium, high, xhigh, max), how to set them in API calls, and when to use each for optimal balance of performance, speed, and cost.

effort parametertoken optimizationextended thinkingClaude APIcost management

Mastering Claude's Effort Parameter: Control Thinking Depth and Token Spend

Claude's effort parameter gives you fine-grained control over how many tokens your model spends on each response. Whether you're building a high-volume chat application, a complex agentic system, or a cost-sensitive tool, understanding effort is key to getting the best performance-to-cost ratio.

In this guide, you'll learn:

What the effort parameter is and how it works
The five effort levels and when to use each
How effort interacts with extended thinking
Practical code examples for Python and TypeScript
Best practices for different use cases

What Is the Effort Parameter?

The effort parameter lets you control how "eager" Claude is about spending tokens when responding to requests. By default, Claude uses high effort, spending as many tokens as needed for excellent results. You can raise the effort to max for the absolute highest capability, or lower it to low for faster, cheaper responses.

Key advantages:

Works without extended thinking — effort affects all tokens, including text responses and tool calls
Controls tool call frequency — lower effort means fewer tool calls, saving tokens
Single-model flexibility — you can trade off between thoroughness and efficiency without switching models

Supported Models

The effort parameter is available on:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6
Claude Opus 4.5

For Opus 4.6 and Sonnet 4.6, effort replaces the deprecated budget_tokens parameter.

Effort Levels Explained

Level	Description	Typical Use Case
`low`	Most efficient. Significant token savings with some capability reduction.	Simple tasks, high-volume chat, subagents
`medium`	Balanced approach with moderate token savings.	Agentic tasks needing speed/cost balance
`high` (default)	High capability. Equivalent to omitting the parameter.	Complex reasoning, coding, agentic tasks
`xhigh`	Extended capability for long-horizon work. Available on Opus 4.7.	Long-running agentic/coding tasks (>30 min)
`max`	Absolute maximum capability with no constraints.	Deepest reasoning, most thorough analysis

Important: Effort is a behavioral signal, not a strict token budget. At lower levels, Claude will still think on sufficiently difficult problems — but it will think less than it would at higher levels for the same problem.

How Effort Works with Extended Thinking

When you combine effort with adaptive thinking (thinking: {type: "adaptive"}), Claude automatically adjusts its thinking depth based on the problem complexity. This is the recommended configuration for most use cases.

At high (default) and max effort, Claude will almost always think. At lower levels, it may skip thinking for simpler problems, saving tokens and reducing latency.

Code Examples

Python (using the Anthropic SDK)

import anthropic
client = anthropic.Anthropic()
Low effort — fast, cheap, for simple tasks
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    extra_headers={"anthropic-effort": "low"}
)
Medium effort — balanced for agentic tasks
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system="You are a coding assistant.",
    messages=[{"role": "user", "content": "Write a Python function to sort a list of dictionaries by a key."}],
    extra_headers={"anthropic-effort": "medium"}
)
High effort (default) — complex reasoning
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=8192,
    messages=[{"role": "user", "content": "Explain the implications of quantum computing on cryptography."}],
    # Omitting effort header defaults to "high"
)
Max effort — deepest reasoning
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=16384,
    messages=[{"role": "user", "content": "Prove the Riemann Hypothesis."}],
    extra_headers={"anthropic-effort": "max"}
)

TypeScript (using the Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Low effort
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  system: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'What is the capital of France?' }],
  extraHeaders: { 'anthropic-effort': 'low' }
});
// Medium effort with adaptive thinking
const response2 = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  thinking: { type: 'adaptive' },
  messages: [{ role: 'user', content: 'Debug this code: ...' }],
  extraHeaders: { 'anthropic-effort': 'medium' }
});
// Max effort for complex analysis
const response3 = await client.messages.create({
  model: 'claude-opus-4-20250514',
  max_tokens: 16384,
  thinking: { type: 'adaptive' },
  messages: [{ role: 'user', content: 'Analyze this legal contract...' }],
  extraHeaders: { 'anthropic-effort': 'max' }
});

Recommended Effort Levels for Sonnet 4.6

Sonnet 4.6 defaults to high effort. To avoid unexpected latency and cost, explicitly set effort when using this model:

Medium effort (recommended default): Best balance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where speed matters more than depth.

Best Practices

1. Start with medium, then adjust

For new applications, begin with medium effort. Monitor response quality and token usage, then adjust up or down based on your specific needs.

2. Use adaptive thinking with effort

Combine effort with thinking: {type: "adaptive"} for the best experience. This lets Claude decide when to think deeply and when to respond quickly, saving tokens on simple queries.

3. Match effort to task complexity

Simple Q&A, classification, extraction: low
Multi-step agents, code generation: medium
Complex reasoning, analysis: high
Research-grade problems, deep analysis: max

4. Consider cost implications

Lower effort levels can significantly reduce token spend, especially on tool calls. For high-volume applications, even a 20% reduction in tokens per call can lead to substantial savings.

5. Test with representative workloads

Effort affects behavior differently depending on the problem. Always test with your actual use case to find the optimal level.

Common Pitfalls

Assuming low effort means no thinking: Claude will still think on difficult problems, just less deeply.
Forgetting to set effort on Sonnet 4.6: Defaults to high, which may be more expensive than needed.
Using effort without adaptive thinking: While effort works without thinking, combining them yields better results.
Expecting strict token budgets: Effort is a behavioral signal, not a hard limit.

Key Takeaways

Effort controls token spend across all response types, including text, tool calls, and extended thinking — without requiring thinking to be enabled.
Five levels (low, medium, high, xhigh, max) let you trade off between speed/cost and capability, all with a single model.
Combine effort with adaptive thinking for the best balance of performance and efficiency.
Explicitly set effort on Sonnet 4.6 to avoid unexpected latency and cost from the default high setting.
Start with medium effort for most applications, then adjust based on observed quality and token usage.