Mastering Claude’s Extended Thinking: A Practical Guide to Adaptive Thinking, Effort, and Task Budgets
Learn how to use Claude’s extended thinking features—adaptive thinking, effort control, task budgets, and fast mode—to optimize reasoning depth, cost, and speed in your API applications.
This guide explains how to configure Claude’s extended thinking capabilities—adaptive thinking, effort levels, task budgets, and fast mode—to control reasoning depth, token usage, and response speed in your API calls.
Claude’s extended thinking capabilities represent a significant leap forward in how you can control the model’s reasoning process. Whether you need deep, chain-of-thought analysis for complex problems or lightning-fast responses for simple queries, understanding these features is essential for getting the most out of the Claude API.
In this guide, you’ll learn how to use adaptive thinking, effort control, task budgets, and fast mode to fine-tune Claude’s reasoning behavior. We’ll cover practical API examples, best practices, and real-world use cases.
What Is Extended Thinking?
Extended thinking allows Claude to allocate additional computational resources—specifically, more tokens and reasoning steps—to produce deeper, more accurate responses. Instead of generating an answer in a single pass, Claude can “think” through a problem step by step, considering alternatives, checking its own logic, and refining its output.
This is especially valuable for:
- Complex mathematical or logical reasoning
- Multi-step planning and analysis
- Code generation with intricate requirements
- Research and document synthesis
Adaptive Thinking: Let Claude Decide the Depth
Adaptive thinking is the simplest way to enable extended reasoning. You tell Claude to think as much as it needs, and it automatically determines the appropriate depth based on the complexity of the prompt.How to Enable Adaptive Thinking
In the API, you enable extended thinking by setting the thinking parameter in your request. Here’s a Python example:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
thinking={
"type": "enabled",
"budget_tokens": 4096 # Maximum tokens Claude can use for thinking
},
messages=[
{"role": "user", "content": "Solve this equation step by step: 3x^2 + 5x - 2 = 0"}
]
)
The response includes both thinking and final content
print(response.content[0].thinking) # Claude's reasoning steps
print(response.content[1].text) # Final answer
Note: When thinking is enabled, the response contains two content blocks: the first is the thinking (visible in the API but not to end users), and the second is the final text output.
When to Use Adaptive Thinking
Use adaptive thinking when you want maximum quality without manual tuning. It’s ideal for:
- Open-ended questions where complexity is unknown
- Applications where accuracy matters more than speed
- Prototyping and experimentation
Effort: Fine-Grained Control Over Reasoning Depth
Effort gives you explicit control over how much Claude thinks. You can set it tolow, medium, or high to balance between speed and depth.
Effort Levels Explained
| Effort Level | Behavior | Use Case |
|---|---|---|
low | Minimal reasoning; fast responses | Simple Q&A, classification, extraction |
medium | Balanced reasoning | General-purpose tasks, moderate complexity |
high | Deep, thorough reasoning | Complex analysis, code generation, research |
API Example with Effort
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
thinking: {
type: 'enabled',
budget_tokens: 2048,
effort: 'high' // Explicitly set effort level
},
messages: [
{ role: 'user', content: 'Write a Python function to merge two sorted lists efficiently.' }
]
});
console.log(response.content[0].thinking);
console.log(response.content[1].text);
Best Practices for Effort
- Use
loweffort for high-throughput applications like chatbots or simple data extraction. - Use
mediumeffort as your default for most tasks. - Use
higheffort only when you need the highest accuracy and can tolerate longer response times.
Task Budgets (Beta): Allocate Thinking Tokens Precisely
Task budgets allow you to set a specific token budget for thinking, separate from the output tokens. This gives you fine-grained control over cost and reasoning depth.How Task Budgets Work
The budget_tokens parameter in the thinking object defines the maximum number of tokens Claude can use for its internal reasoning. The max_tokens parameter still controls the final output length.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096, # Final output limit
thinking={
"type": "enabled",
"budget_tokens": 8192 # Thinking limit (can be larger than max_tokens)
},
messages=[
{"role": "user", "content": "Explain the proof of Fermat's Last Theorem in simple terms."}
]
)
Key Considerations
- Budget tokens can exceed
max_tokens: Claude may think more than it writes. - Cost is based on total tokens used: Both thinking and output tokens count toward your usage.
- Minimum budget: You must set at least 1024 budget tokens when enabling thinking.
Fast Mode (Beta: Research Preview): Speed Without Thinking
Fast mode is the opposite of extended thinking. It disables the thinking process entirely, forcing Claude to respond as quickly as possible. This is useful for real-time applications where latency is critical.Enabling Fast Mode
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
thinking={
"type": "disabled" # Explicitly disable thinking
},
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
Note: Fast mode is currently in research preview. It may not be available on all models or regions.
When to Use Fast Mode
- Simple factual queries
- High-traffic chatbots
- Real-time translation or transcription
- Any application where sub-second response time is required
Combining Features: A Practical Workflow
Here’s a complete example that demonstrates how to use all these features together in a single application:
import anthropic
client = anthropic.Anthropic()
def get_claude_response(prompt, complexity="medium"):
"""
Get a response from Claude with dynamic thinking configuration.
"""
# Configure thinking based on complexity
if complexity == "low":
thinking_config = {"type": "disabled"} # Fast mode
max_tokens = 1024
elif complexity == "medium":
thinking_config = {
"type": "enabled",
"budget_tokens": 2048,
"effort": "medium"
}
max_tokens = 2048
else: # high
thinking_config = {
"type": "enabled",
"budget_tokens": 8192,
"effort": "high"
}
max_tokens = 4096
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=max_tokens,
thinking=thinking_config,
messages=[{"role": "user", "content": prompt}]
)
return response
Example usage
simple_response = get_claude_response("What is 2+2?", "low")
complex_response = get_claude_response(
"Analyze the economic impact of quantum computing on cryptography.",
"high"
)
Troubleshooting Common Issues
Thinking Content Not Visible
If you don’t see thinking content in the response, ensure:- You’re using a model that supports extended thinking (Claude 3.5 Sonnet, Claude 4 Sonnet, or later)
- The
thinkingparameter is properly set to{"type": "enabled"} - You’re accessing
response.content[0].thinking(not.text)
Budget Exceeded
If Claude stops thinking before reaching a conclusion, increasebudget_tokens. A good rule of thumb is to set it to 50-100% of your max_tokens value.
High Latency
If responses are too slow:- Reduce
efforttolowormedium - Lower
budget_tokens - Consider using fast mode (
thinking: {"type": "disabled"})
Key Takeaways
- Adaptive thinking lets Claude automatically determine reasoning depth, making it ideal for general-purpose use.
- Effort control (
low,medium,high) gives you explicit control over the speed-accuracy trade-off. - Task budgets allow precise allocation of thinking tokens, separate from output tokens.
- Fast mode disables thinking entirely for maximum speed in simple, high-throughput scenarios.
- Always test different configurations with your specific use case to find the optimal balance of cost, speed, and quality.