Mastering Claude’s Extended Thinking: A Practical Guide to Adaptive Thinking and Task Budgets
Learn how to use Claude’s Extended Thinking, Adaptive Thinking, and Task Budgets to improve reasoning depth, control costs, and handle complex tasks efficiently.
This guide explains how to enable and optimize Claude’s Extended Thinking, Adaptive Thinking, and Task Budgets to balance reasoning depth, speed, and cost in your API applications.
Introduction
Claude’s reasoning capabilities have evolved significantly. With the introduction of Extended Thinking, Adaptive Thinking, and Task Budgets, developers now have fine-grained control over how Claude processes complex problems. Whether you’re building a research assistant, a code analyzer, or a multi-step reasoning agent, understanding these features is critical to getting the best performance and cost-efficiency.
This guide walks you through each feature, explains when to use them, and provides practical code examples to integrate them into your Claude API workflows.
What Is Extended Thinking?
Extended Thinking allows Claude to allocate more tokens to its internal reasoning process before generating a final response. This is especially useful for tasks that require deep logical chains, mathematical proofs, or multi-step analysis.By default, Claude uses a limited thinking budget. With Extended Thinking, you can set a higher budget, enabling the model to "think longer" and produce more accurate, nuanced answers.
When to Use Extended Thinking
- Complex math or logic problems
- Legal or contract analysis
- Multi-step code generation or debugging
- Research synthesis with many sources
How to Enable Extended Thinking
In the API, you enable Extended Thinking by setting the thinking parameter in your request. Here’s an example in Python:
import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2048 # Tokens allocated for thinking
},
messages=[
{"role": "user", "content": "Solve this equation step by step: 3x + 7 = 22"}
]
)
print(response.content[0].text)
Note: Thebudget_tokensvalue must be less thanmax_tokens. The thinking budget is consumed from the total token limit.
Adaptive Thinking: Let Claude Decide
Adaptive Thinking (available as a research preview) lets Claude dynamically decide how much thinking budget to use based on the complexity of the task. Instead of you setting a fixed budget, Claude estimates the required reasoning depth and allocates tokens accordingly.This is ideal for:
- Mixed workloads where some queries are simple and others are complex
- Reducing costs on easy questions while maintaining depth on hard ones
- Applications where you don’t want to manually tune budgets per request
Enabling Adaptive Thinking
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 4096,
"adaptive": True # Enable adaptive thinking
},
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
When adaptive is set to True, Claude may use fewer tokens than the budget for simple queries, saving you money and latency.
Task Budgets: Fine-Tuning Effort
Task Budgets (beta) allow you to set a maximum thinking effort for a specific task. This is different from token budgets—it controls how much computational effort Claude applies, which can affect both reasoning depth and response time.Use Task Budgets when:
- You need consistent response times (e.g., for real-time applications)
- You want to limit costs on non-critical tasks
- You’re batching requests and need predictable throughput
Setting a Task Budget
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2048,
"task_budget": 0.5 # 0.0 to 1.0, where 1.0 is maximum effort
},
messages=[
{"role": "user", "content": "Summarize this article in 3 bullet points."}
]
)
A task_budget of 0.5 means Claude will use roughly half the maximum effort, trading some depth for speed.
Fast Mode: Speed Over Depth
Fast mode (research preview) is designed for scenarios where latency is critical. It reduces thinking time and returns responses more quickly, at the cost of some reasoning quality.Enable Fast Mode when:
- Building chatbots that need sub-second responses
- Handling simple Q&A or retrieval tasks
- Prototyping and iterating quickly
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
thinking={
"type": "enabled",
"budget_tokens": 512,
"fast_mode": True
},
messages=[
{"role": "user", "content": "What is 2+2?"}
]
)
Combining Features for Optimal Results
You can combine Adaptive Thinking, Task Budgets, and Fast Mode to create a custom reasoning profile. For example:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
thinking={
"type": "enabled",
"budget_tokens": 4096,
"adaptive": True,
"task_budget": 0.8,
"fast_mode": False
},
messages=[
{"role": "user", "content": "Explain quantum entanglement in simple terms."}
]
)
This configuration gives Claude a high budget, adaptive allocation, and near-maximum effort, but disables fast mode for thorough reasoning.
Best Practices
- Start with a moderate budget – For most tasks, 1024–2048 thinking tokens is sufficient. Increase only if you see shallow responses.
- Use Adaptive Thinking for variable workloads – It saves tokens on simple queries and allocates more for complex ones.
- Monitor token usage – Use the API response’s
usagefield to track thinking tokens and adjust budgets. - Test with Task Budgets – If response time is critical, set a task budget of 0.3–0.5 to balance speed and quality.
- Avoid Fast Mode for reasoning-heavy tasks – It’s best for simple, repetitive queries.
Key Takeaways
- Extended Thinking gives Claude more tokens to reason, improving accuracy on complex tasks.
- Adaptive Thinking automatically adjusts the thinking budget based on task difficulty, saving costs.
- Task Budgets let you control computational effort, balancing speed and depth.
- Fast Mode prioritizes low latency over reasoning quality.
- Combine these features strategically to optimize for your specific use case—whether that’s deep analysis, real-time chat, or cost-sensitive batch processing.