Mastering Adaptive Thinking in Claude: A Practical Guide to Smarter AI Reasoning
Learn how to use Claude's adaptive thinking mode to dynamically allocate reasoning effort, improve agentic workflows, and optimize API costs with practical code examples.
This guide explains how to use Claude's adaptive thinking mode to let the model dynamically decide when and how much to reason, replacing manual token budgets for better performance in agentic and bimodal tasks.
Introduction
Claude's extended thinking capability has been a game-changer for complex reasoning tasks, but manually setting a budget_tokens value often leads to either wasted tokens on simple queries or insufficient reasoning on hard problems. Enter adaptive thinking — a smarter, dynamic approach that lets Claude decide when and how much to think based on the complexity of each request.
This guide covers everything you need to know to implement adaptive thinking in your Claude API projects, from basic setup to advanced optimization with the effort parameter.
What Is Adaptive Thinking?
Adaptive thinking is the recommended way to use extended thinking with Claude Opus 4.7, Opus 4.6, Sonnet 4.6, and the Claude Mythos Preview. Instead of requiring you to set a fixed thinking token budget, it allows Claude to evaluate each request independently and determine:
- Whether extended thinking is needed at all
- How much thinking to allocate
Key Benefits
- Better performance on mixed-complexity tasks compared to fixed budgets
- Automatic interleaved thinking — Claude can think between tool calls, improving multi-step reasoning
- Simpler API calls — no need to guess a
budget_tokensvalue - Cost efficiency — lower effort levels can skip thinking for simple problems
Supported Models
| Model | Adaptive Thinking Support | Notes |
|---|---|---|
Claude Mythos Preview (claude-mythos-preview) | ✅ Default mode | Thinking cannot be disabled |
Claude Opus 4.7 (claude-opus-4-7) | ✅ Only supported mode | Manual enabled mode rejected with 400 error |
Claude Opus 4.6 (claude-opus-4-6) | ✅ Supported | Manual enabled deprecated |
Claude Sonnet 4.6 (claude-sonnet-4-6) | ✅ Supported | Manual enabled deprecated |
| Older models (Sonnet 4.5, Opus 4.5, etc.) | ❌ Not supported | Must use thinking.type: "enabled" with budget_tokens |
Warning: On Opus 4.6 and Sonnet 4.6,thinking.type: "enabled"withbudget_tokensis deprecated and will be removed in a future model release. Migrate to adaptive thinking as soon as possible.
How Adaptive Thinking Works
When you set thinking.type to "adaptive", Claude's internal router evaluates each incoming request and decides:
- Is this a thinking-worthy problem? Simple factual questions may skip thinking entirely.
- How deep should the reasoning go? Complex math, code generation, or multi-step planning gets more thinking tokens.
high), Claude almost always thinks. At lower effort levels, it may skip thinking for simpler problems, saving tokens and latency.
Additionally, adaptive thinking automatically enables interleaved thinking — the model can pause between tool calls to reason about results, plan next steps, or adjust its strategy. This is critical for agentic workflows where the model needs to iterate.
Getting Started: Basic Implementation
Prerequisites
- An Anthropic API key
- The Anthropic Python SDK (
pip install anthropic) or TypeScript SDK (npm install @anthropic-ai/sdk) - Access to a supported model
Python Example
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive"},
messages=[
{
"role": "user",
"content": "Explain why the sum of two even numbers is always even."
}
]
)
for block in response.content:
if block.type == "thinking":
print(f"\nThinking: {block.thinking}")
elif block.type == "text":
print(f"\nResponse: {block.text}")
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 16000,
thinking: { type: 'adaptive' },
messages: [
{
role: 'user',
content: 'Explain why the sum of two even numbers is always even.'
}
]
});
for (const block of response.content) {
if (block.type === 'thinking') {
console.log(\nThinking: ${block.thinking});
} else if (block.type === 'text') {
console.log(\nResponse: ${block.text});
}
}
Controlling Thinking Depth with the effort Parameter
Adaptive thinking can be combined with an optional effort parameter that acts as a soft guide for how much thinking Claude should do. This is especially useful when you want to balance reasoning quality against latency or cost.
Effort Levels
| Effort Level | Behavior | Use Case |
|---|---|---|
low | Minimal thinking; may skip for simple queries | High-throughput, low-latency applications |
medium | Balanced reasoning | General-purpose use |
high | Maximum reasoning (default) | Complex problem-solving, agentic tasks |
Python Example with Effort
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={
"type": "adaptive",
"effort": "medium" # or "low", "high"
},
messages=[
{
"role": "user",
"content": "Design a scalable microservices architecture for an e-commerce platform."
}
]
)
TypeScript Example with Effort
const response = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 16000,
thinking: {
type: 'adaptive',
effort: 'medium'
},
messages: [
{
role: 'user',
content: 'Design a scalable microservices architecture for an e-commerce platform.'
}
]
});
Note: The effort parameter is a beta feature. Its behavior may change in future releases. Always test with your specific workload.
Adaptive Thinking in Agentic Workflows
One of the most powerful applications of adaptive thinking is in agentic workflows where Claude uses tools to accomplish multi-step tasks. With interleaved thinking enabled automatically, Claude can:
- Receive a complex request (e.g., "Research the latest AI papers and summarize them")
- Think about the approach before making the first tool call
- Execute a search tool and receive results
- Think about the results to decide next steps
- Call additional tools or synthesize the final answer
Example: Research Agent with Adaptive Thinking
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=32000,
thinking={"type": "adaptive"},
tools=[
{
"name": "web_search",
"description": "Search the web for information",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
],
messages=[
{
"role": "user",
"content": "Find the top 3 AI breakthroughs announced in 2025 and explain their significance."
}
]
)
The response may contain interleaved thinking and tool_use blocks
for block in response.content:
if block.type == "thinking":
print(f"[Thinking] {block.thinking[:200]}...")
elif block.type == "tool_use":
print(f"[Tool Call] {block.name}({block.input})")
elif block.type == "text":
print(f"[Final] {block.text}")
Migrating from Fixed Budget to Adaptive Thinking
If you're currently using thinking.type: "enabled" with budget_tokens, here's how to migrate:
Before (Deprecated)
thinking={
"type": "enabled",
"budget_tokens": 8000
}
After (Recommended)
thinking={
"type": "adaptive"
}
Or with effort control:
thinking={
"type": "adaptive",
"effort": "high"
}
Migration Checklist
- ✅ Update your model to a supported version (Opus 4.7, Opus 4.6, Sonnet 4.6, or Mythos Preview)
- ✅ Replace
thinking.type: "enabled"withthinking.type: "adaptive" - ✅ Remove the
budget_tokensparameter - ✅ Optionally add the
effortparameter for finer control - ✅ Test with your workloads to ensure performance meets expectations
Best Practices
1. Start with Default Effort
Begin with effort: "high" (or omit it entirely) to get the full benefit of adaptive thinking. Only reduce effort if you have specific latency or cost constraints.
2. Use Adaptive Thinking for Agentic Tasks
If your application involves multi-step reasoning with tools, adaptive thinking is almost always superior to fixed budgets because of interleaved thinking.
3. Monitor Thinking Tokens
Even though you don't set a budget, you can still inspect the thinking blocks in the response to understand how much reasoning Claude performed. Use this to validate that the effort level matches your needs.
4. Combine with Prompt Caching
Adaptive thinking works well with prompt caching. Cache your system prompts and tool definitions to reduce latency and cost, while letting Claude dynamically allocate thinking tokens.
5. Test Bimodal Workloads
If your application handles both simple and complex queries, adaptive thinking shines. Test with a representative sample to see how often Claude skips thinking for easy questions.
Limitations and Considerations
- Predictable latency: If you need guaranteed response times, adaptive thinking may not be ideal because thinking depth varies per request. Consider using fixed
budget_tokenson Opus 4.6/Sonnet 4.6 if latency predictability is critical. - Cost control: While adaptive thinking can save tokens on simple queries, it may use more tokens on complex ones than a conservative fixed budget. Monitor your usage.
- Beta features: The
effortparameter is in beta. Its behavior may change, and it may not be available in all regions or for all models.
Conclusion
Adaptive thinking represents a significant evolution in how Claude handles reasoning. By letting the model decide when and how much to think, you can achieve better performance on mixed workloads, simplify your code, and unlock more natural agentic behaviors through interleaved thinking.
Whether you're building a simple Q&A bot or a complex multi-agent system, adaptive thinking is now the recommended path forward. Migrate your existing integrations today to take full advantage of Claude's latest reasoning capabilities.
Key Takeaways
- Adaptive thinking lets Claude dynamically decide when and how much to reason, replacing manual
budget_tokensfor better performance on bimodal and agentic workloads. - Supported on Claude Opus 4.7, Opus 4.6, Sonnet 4.6, and Mythos Preview — older models still require fixed budgets.
- Interleaved thinking is automatically enabled, allowing Claude to reason between tool calls in agentic workflows.
- The optional
effortparameter (low,medium,high) provides soft control over thinking depth without a fixed token budget. - Migrate now if you're using deprecated
thinking.type: "enabled"— it will be removed in future model releases.