Mastering Claude’s Extended Thinking: A Practical Guide to Adaptive Thinking, Effort, and Fast Mode
Learn how to use Claude’s Extended Thinking, Adaptive Thinking, Task Budgets, and Fast Mode to control reasoning depth, speed, and cost in your API applications.
This guide explains how to configure Claude’s Extended Thinking features—Adaptive Thinking, Task Budgets, and Fast Mode—to balance reasoning depth, response speed, and token cost in your API calls.
Introduction
Claude’s Extended Thinking capabilities represent a significant leap forward in how you can control the model’s reasoning process. Whether you need deep, chain-of-thought analysis for complex problems or lightning-fast responses for simple queries, understanding these features is essential for building efficient, cost-effective applications.
In this guide, you’ll learn how to use Adaptive Thinking, Task Budgets, and Fast Mode to fine-tune Claude’s behavior. We’ll cover practical API examples, best practices, and real-world scenarios to help you get the most out of these tools.
What Is Extended Thinking?
Extended Thinking allows Claude to reason step-by-step before generating a final response. Instead of producing an answer immediately, the model can “think” internally, breaking down complex tasks into smaller parts. This is especially useful for:
- Mathematical problem solving
- Code generation and debugging
- Multi-step reasoning tasks
- Complex decision-making
Adaptive Thinking: Let Claude Decide
Adaptive Thinking is a mode where Claude automatically determines how much thinking effort to apply based on the complexity of the input. This is the easiest way to get started because you don’t need to specify a budget manually.How It Works
When you enable Adaptive Thinking, Claude analyzes the prompt and allocates thinking tokens accordingly. Simple questions get minimal thinking; complex ones get more. This saves tokens and reduces latency for straightforward tasks while still allowing deep reasoning when needed.
API Example (Python)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
thinking={
"type": "enabled",
"budget_tokens": 1024,
"adaptive": True # Enable adaptive thinking
},
messages=[
{"role": "user", "content": "What is the square root of 144?"}
]
)
print(response.content[0].text)
Note: Whenadaptiveis set toTrue, thebudget_tokensacts as a maximum ceiling. Claude will use fewer tokens if the task doesn’t require the full budget.
When to Use Adaptive Thinking
- General-purpose chatbots – You don’t know the complexity of user queries in advance.
- Cost-sensitive applications – You want to avoid overspending on simple requests.
- Prototyping – Quickly test without fine-tuning budgets.
Task Budgets (Beta): Precise Control
Task Budgets let you specify exactly how many thinking tokens Claude should use for a given task. This is more rigid than Adaptive Thinking but gives you deterministic control over reasoning depth.API Example (Python)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
thinking={
"type": "enabled",
"budget_tokens": 2048, # Fixed budget
"adaptive": False
},
messages=[
{"role": "user", "content": "Solve this equation step by step: 3x + 7 = 22"}
]
)
Best Practices for Task Budgets
- Start small, then increase – Begin with a modest budget (e.g., 512 tokens) and monitor response quality.
- Match budget to task complexity – Simple Q&A: 256–512 tokens. Multi-step reasoning: 1024–2048 tokens. Code generation: 2048+ tokens.
- Combine with
max_tokens– Always setmax_tokenshigher than your thinking budget to leave room for the final response.
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 2048,
thinking: {
type: 'enabled',
budget_tokens: 1024,
adaptive: false,
},
messages: [
{ role: 'user', content: 'Write a Python function to merge two sorted lists.' }
]
});
console.log(response.content[0].text);
Fast Mode (Beta: Research Preview)
Fast Mode is designed for scenarios where speed is more important than deep reasoning. When enabled, Claude minimizes thinking time, producing responses faster—but with potentially less thorough analysis.How to Enable Fast Mode
Fast Mode is activated by setting a special parameter in the API call. Note that this is a research preview, so behavior may change.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
thinking={
"type": "enabled",
"budget_tokens": 512,
"fast_mode": True # Enable fast mode
},
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
When to Use Fast Mode
- Real-time applications – Chatbots requiring sub-second responses.
- Simple, factual queries – No need for chain-of-thought.
- High-throughput systems – Reduce latency per request.
Caution: Fast Mode may reduce accuracy on complex or ambiguous tasks. Always test with your specific use case.
Combining Features: A Practical Workflow
You can mix Adaptive Thinking, Task Budgets, and Fast Mode to create a tiered system:
- Default path – Adaptive Thinking with a moderate budget (1024 tokens).
- Simple queries – Fast Mode enabled, low budget (256 tokens).
- Complex tasks – Fixed Task Budget (2048 tokens), no Fast Mode.
Example: Tiered Router
def get_thinking_config(complexity: str):
if complexity == "simple":
return {
"type": "enabled",
"budget_tokens": 256,
"adaptive": False,
"fast_mode": True
}
elif complexity == "medium":
return {
"type": "enabled",
"budget_tokens": 1024,
"adaptive": True,
"fast_mode": False
}
else: # complex
return {
"type": "enabled",
"budget_tokens": 2048,
"adaptive": False,
"fast_mode": False
}
Usage
config = get_thinking_config("medium")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=config["budget_tokens"] + 512,
thinking=config,
messages=[{"role": "user", "content": user_input}]
)
Monitoring and Debugging
When using Extended Thinking, you can inspect the thinking content in the API response. This is useful for debugging or auditing.
for block in response.content:
if block.type == "thinking":
print("Thinking:", block.thinking)
elif block.type == "text":
print("Response:", block.text)
Key Takeaways
- Adaptive Thinking automatically adjusts reasoning depth based on input complexity, saving tokens and time on simple tasks.
- Task Budgets give you precise control over thinking tokens, ideal for deterministic workflows.
- Fast Mode prioritizes speed over depth, suitable for real-time or simple queries.
- Combine these features in a tiered system to balance cost, speed, and accuracy.
- Always test with your specific use case—especially Fast Mode, which may reduce accuracy on complex tasks.