Adaptive Thinking in Claude API: A Complete Guide to Dynamic Extended Thinking
Learn how to use adaptive thinking with Claude Opus 4.7, Opus 4.6, Sonnet 4.6, and Mythos Preview. Includes code examples, effort parameter usage, and migration tips.
This guide explains how to use adaptive thinking in the Claude API — a dynamic mode where Claude decides when and how much to think based on request complexity. You'll learn how to enable it, use the effort parameter, and migrate from fixed budget_tokens.
What Is Adaptive Thinking?
Adaptive thinking is the recommended way to use extended thinking with Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Mythos Preview. Instead of manually setting a fixed budget_tokens value, adaptive thinking lets Claude dynamically determine when and how much to use extended thinking based on the complexity of each request.
This feature is especially powerful for bimodal tasks (mixing simple and complex queries) and long-horizon agentic workflows (where Claude needs to reason across multiple tool calls).
Note: Adaptive thinking is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.
Supported Models
| Model | Adaptive Thinking Support | Notes |
|---|---|---|
Claude Mythos Preview (claude-mythos-preview) | Default mode | Auto-applies when thinking is unset. thinking: {type: "disabled"} is not supported. |
Claude Opus 4.7 (claude-opus-4-7) | Only supported mode | Manual thinking: {type: "enabled"} is rejected with a 400 error. |
Claude Opus 4.6 (claude-opus-4-6) | Supported | budget_tokens is deprecated but still functional. |
Claude Sonnet 4.6 (claude-sonnet-4-6) | Supported | budget_tokens is deprecated but still functional. |
Important: Older models (Sonnet 4.5, Opus 4.5, etc.) do not support adaptive thinking. They requirethinking: {type: "enabled"}withbudget_tokens.
How Adaptive Thinking Works
In adaptive mode, thinking is optional for the model. Claude evaluates the complexity of each request and decides:
- Whether to use extended thinking at all
- How much thinking to allocate
high), Claude almost always thinks. At lower effort levels, Claude may skip thinking for simpler problems, saving tokens and reducing latency.
Adaptive thinking also automatically enables interleaved thinking — meaning Claude can think between tool calls. This makes it especially effective for agentic workflows where the model needs to reason after receiving tool results.
How to Use Adaptive Thinking
Basic Usage
Set thinking.type to "adaptive" in your API request:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive"},
messages=[
{
"role": "user",
"content": "Explain why the sum of two even numbers is always even."
}
]
)
for block in response.content:
if block.type == "thinking":
print(f"\nThinking: {block.thinking}")
elif block.type == "text":
print(f"\nResponse: {block.text}")
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 16000,
thinking: { type: 'adaptive' },
messages: [
{
role: 'user',
content: 'Explain why the sum of two even numbers is always even.'
}
]
});
for (const block of response.content) {
if (block.type === 'thinking') {
console.log(\nThinking: ${block.thinking});
} else if (block.type === 'text') {
console.log(\nResponse: ${block.text});
}
}
Using the Effort Parameter
You can combine adaptive thinking with the effort parameter to guide how much thinking Claude does. The effort level acts as soft guidance — it influences but doesn't strictly control the thinking budget.
Available Effort Levels
| Effort Level | Behavior |
|---|---|
low | Minimal thinking; Claude may skip thinking for simple requests. Best for high-throughput, low-latency applications. |
medium | Balanced approach; Claude thinks when beneficial but conserves tokens for straightforward tasks. |
high (default) | Claude almost always thinks; suitable for complex reasoning, math, coding, and agentic workflows. |
Example with Effort
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={
"type": "adaptive",
"effort": "medium" # or "low", "high"
},
messages=[
{
"role": "user",
"content": "Write a Python function to merge two sorted lists."
}
]
)
for block in response.content:
if block.type == "thinking":
print(f"\nThinking: {block.thinking}")
elif block.type == "text":
print(f"\nResponse: {block.text}")
Task Budgets (Beta)
For more granular control, you can use task budgets to set token limits for specific thinking phases. This is useful when you want to cap thinking costs while still benefiting from adaptive allocation.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={
"type": "adaptive",
"task_budgets": {
"max_thinking_tokens": 8000,
"max_response_tokens": 8000
}
},
messages=[
{"role": "user", "content": "Solve this complex math problem step by step."}
]
)
Note: Task budgets are in beta and may change in future API versions.
Fast Mode (Research Preview)
For applications where speed is critical, you can enable fast mode alongside adaptive thinking. Fast mode reduces thinking time at the cost of some reasoning quality.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={
"type": "adaptive",
"fast_mode": True # research preview
},
messages=[
{"role": "user", "content": "Summarize this article in 3 bullet points."}
]
)
Migrating from Fixed budget_tokens
If you're currently using thinking: {type: "enabled", budget_tokens: N} on Opus 4.6 or Sonnet 4.6, here's how to migrate:
Before (deprecated)
thinking={
"type": "enabled",
"budget_tokens": 16000
}
After (recommended)
thinking={
"type": "adaptive",
"effort": "high" # approximates a generous budget
}
For workloads where you previously used a low budget_tokens (e.g., 2000), try effort: "low" or effort: "medium".
Best Practices
1. Start with High Effort
For most applications, start with effort: "high" and monitor performance. If you find Claude is over-thinking simple requests, gradually reduce the effort level.
2. Use Adaptive Thinking for Agentic Workflows
Adaptive thinking's interleaved thinking capability makes it ideal for multi-step agentic tasks. Claude can think after each tool call, improving decision quality.
3. Monitor Token Usage
Even though adaptive thinking is dynamic, you should still monitor token consumption. Use max_tokens to set an upper bound on total response length.
4. Combine with Structured Outputs
For applications requiring structured responses (e.g., JSON), combine adaptive thinking with structured outputs:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive"},
messages=[
{
"role": "user",
"content": "Extract the name, date, and amount from this invoice."
}
],
response_format={"type": "json_object"}
)
5. Test with Both Modes
If you're migrating from fixed budget_tokens, run A/B tests comparing the old mode with adaptive thinking to ensure quality remains consistent or improves.
Common Pitfalls
- Using adaptive thinking on unsupported models: Older models (Sonnet 4.5, Opus 4.5) will ignore the
adaptivetype and fall back to no thinking. - Setting
budget_tokenswith adaptive thinking: This combination is ignored; useeffortinstead. - Expecting fixed latency: Adaptive thinking means variable thinking time. If you need predictable latency, consider using fixed
budget_tokenson Opus 4.6 or Sonnet 4.6 (though deprecated).
Key Takeaways
- Adaptive thinking is the recommended mode for Claude Opus 4.7, Opus 4.6, Sonnet 4.6, and Mythos Preview. It dynamically allocates thinking based on request complexity.
- Use the
effortparameter (low,medium,high) to guide thinking depth without setting a fixed token budget. - Interleaved thinking is automatic in adaptive mode, making it ideal for agentic workflows with multiple tool calls.
- Migrate from
budget_tokensby replacingthinking: {type: "enabled", budget_tokens: N}withthinking: {type: "adaptive", effort: "high"}. - Start with
effort: "high"and adjust downward based on your latency and cost requirements.