Mastering Claude’s Extended Thinking: A Practical Guide to Adaptive Thinking, Effort, and Task Budgets
Learn how to use Claude’s Extended Thinking, Adaptive Thinking, Effort, and Task Budgets to control reasoning depth, speed, and cost in your API applications.
This guide explains how to configure Claude’s Extended Thinking modes—Adaptive Thinking, Effort, and Task Budgets—to balance reasoning depth, response speed, and API costs. You’ll get practical code examples and best practices for each setting.
Introduction
Claude’s Extended Thinking capabilities allow you to control how deeply the model reasons before generating a response. Whether you need a quick answer, a deeply reasoned analysis, or a budget-conscious solution, understanding these settings is essential for building efficient, cost-effective applications.
In this guide, you’ll learn:
- What Extended Thinking is and when to use it
- How Adaptive Thinking dynamically adjusts reasoning depth
- How to set Effort levels for predictable reasoning
- How Task Budgets (beta) cap thinking time and cost
- Practical code examples in Python and TypeScript
What Is Extended Thinking?
Extended Thinking is a Claude API feature that lets the model spend more computational resources on reasoning before producing a final answer. This is especially useful for complex tasks like:
- Multi-step math problems
- Code generation and debugging
- Legal or medical analysis
- Long-form content creation
Adaptive Thinking: Dynamic Reasoning Depth
Adaptive Thinking is the most flexible mode. Claude automatically decides how much thinking time to allocate based on the complexity of the input. You don’t set a fixed budget—Claude adjusts on the fly.
When to Use Adaptive Thinking
- You want the best balance of speed and accuracy
- Your use case has variable complexity (e.g., a chatbot that handles both simple FAQs and complex troubleshooting)
- You don’t want to manually tune effort levels
Python Example
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
thinking={
"type": "enabled",
"budget_tokens": 1024
},
messages=[
{"role": "user", "content": "Solve this equation step by step: 3x^2 + 5x - 2 = 0"}
]
)
print(response.content[0].text)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
thinking: {
type: 'enabled',
budget_tokens: 1024
},
messages: [
{ role: 'user', content: 'Solve this equation step by step: 3x^2 + 5x - 2 = 0' }
]
});
console.log(response.content[0].text);
Note: Whenthinkingis enabled, you must setbudget_tokensto a value less than or equal tomax_tokens. The thinking budget defines how many tokens Claude can use for internal reasoning.
Effort: Predictable Reasoning Control
Effort lets you explicitly set how much reasoning Claude should apply. Options include low, medium, and high. This is ideal when you know the complexity of your task in advance.
| Effort Level | Use Case |
|---|---|
| low | Simple Q&A, quick lookups |
| medium | Balanced reasoning for most tasks |
| high | Complex analysis, multi-step reasoning |
Python Example
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
thinking={
"type": "enabled",
"budget_tokens": 2048,
"effort": "high"
},
messages=[
{"role": "user", "content": "Explain the implications of quantum computing on modern cryptography."}
]
)
TypeScript Example
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
thinking: {
type: 'enabled',
budget_tokens: 2048,
effort: 'high'
},
messages: [
{ role: 'user', content: 'Explain the implications of quantum computing on modern cryptography.' }
]
});
Task Budgets (Beta): Cost and Time Caps
Task Budgets are a beta feature that let you set a maximum thinking time or token budget. If Claude exceeds the budget, it returns a partial response or a fallback. This is useful for:
- Keeping API costs predictable
- Ensuring low-latency responses
- Handling user-facing applications where response time matters
How It Works
- Set
budget_tokensto the maximum tokens Claude can use for thinking - If the budget is reached, Claude stops thinking and returns its best answer so far
- You can also set a
time_budgetin milliseconds (if supported by your model)
Python Example
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
thinking={
"type": "enabled",
"budget_tokens": 512, # Strict cap on thinking tokens
"effort": "medium"
},
messages=[
{"role": "user", "content": "Write a detailed essay on the history of the Roman Empire."}
]
)
TypeScript Example
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 2048,
thinking: {
type: 'enabled',
budget_tokens: 512,
effort: 'medium'
},
messages: [
{ role: 'user', content: 'Write a detailed essay on the history of the Roman Empire.' }
]
});
Fast Mode (Beta: Research Preview)
Fast Mode is a research preview feature that trades some reasoning depth for significantly faster responses. It’s ideal for real-time applications where speed is critical, such as:
- Live chat support
- Code autocompletion
- Interactive tutoring
thinking.type to "fast" instead of "enabled".
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
thinking={
"type": "fast",
"budget_tokens": 256
},
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
Best Practices
- Start with Adaptive Thinking – It’s the easiest way to get good results without manual tuning.
- Use Effort for known complexity – If you know a task is hard, set effort to
high. If it’s simple, uselowto save tokens. - Set Task Budgets for cost control – Always set a
budget_tokensvalue that aligns with your cost tolerance. - Monitor token usage – Use the
usagefield in the API response to track thinking tokens vs. output tokens. - Test with Fast Mode – If latency is a problem, try Fast Mode and evaluate if the quality drop is acceptable.
Common Pitfalls
- Forgetting to set
budget_tokens– Without it, thinking mode may consume excessive tokens. - Setting
budget_tokenshigher thanmax_tokens– This will cause an API error. Always ensurebudget_tokens <= max_tokens. - Using high effort for simple tasks – This wastes tokens and increases latency without improving quality.
- Not handling partial responses – When using Task Budgets, always check if the response is complete.
Key Takeaways
- Extended Thinking gives you fine-grained control over Claude’s reasoning depth, speed, and cost.
- Adaptive Thinking is the easiest starting point—Claude automatically adjusts reasoning effort.
- Effort levels (
low,medium,high) let you explicitly control reasoning for predictable tasks. - Task Budgets (beta) cap thinking tokens or time, ensuring predictable costs and latency.
- Fast Mode (research preview) prioritizes speed over depth for real-time applications.
- Always set
budget_tokensand monitor usage to avoid unexpected API costs.