Mastering Extended Thinking in Claude: A Complete Guide to Adaptive and Manual Reasoning
Learn how to use Claude's extended thinking for complex reasoning tasks. Covers adaptive thinking, manual mode, effort parameters, and code examples for all supported models.
This guide explains how to enable and configure Claude's extended thinking feature—both adaptive and manual modes—to get step-by-step reasoning from the model. You'll learn which mode works with each Claude model, how to set effort levels, and how to handle thinking blocks in your API responses.
Introduction
Claude's extended thinking feature unlocks a new level of reasoning capability. When enabled, Claude produces internal "thinking" content blocks that reveal its step-by-step reasoning process before it delivers a final answer. This transparency is invaluable for debugging complex logic, auditing model decisions, and building trust in AI-generated outputs.
This guide covers everything you need to know to use extended thinking effectively: the two modes (adaptive and manual), which models support each mode, how to configure effort levels, and practical code examples for the Messages API.
How Extended Thinking Works
When extended thinking is turned on, Claude's API response includes one or more thinking content blocks followed by text content blocks. Each thinking block contains a thinking field (the reasoning text) and a signature field (used for verification). Here's the default response structure:
{
"content": [
{
"type": "thinking",
"thinking": "Let me analyze this step by step...",
"signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
},
{
"type": "text",
"text": "Based on my analysis..."
}
]
}
Claude incorporates insights from its reasoning before crafting the final text response, so the thinking blocks are not just a log—they actively inform the answer.
Adaptive Thinking vs. Manual Extended Thinking
Claude offers two modes for extended thinking:
Adaptive Thinking (Recommended)
Adaptive thinking lets Claude decide how much thinking to do based on the complexity of the task. You specify an effort level (low, medium, high) rather than a fixed token budget. This is the modern, flexible approach and is required for Claude Opus 4.7.
When to use: Most use cases, especially when you want Claude to allocate reasoning effort dynamically.Manual Extended Thinking
Manual mode lets you set a fixed budget_tokens value—the maximum number of tokens Claude can use for thinking. This gives you precise control but requires you to guess how much thinking a task needs.
Important: Manual extended thinking is deprecated on Claude Opus 4.6 and Claude Sonnet 4.6, and not supported on Claude Opus 4.7 (returns a 400 error). Use adaptive thinking for all new projects.
Model Support Matrix
| Model | Adaptive Thinking | Manual Extended Thinking | Notes |
|---|---|---|---|
| Claude Opus 4.7 | ✅ Required | ❌ Not supported (400 error) | Use thinking: {type: "adaptive"} with effort |
| Claude Opus 4.6 | ✅ Recommended | ✅ Deprecated but functional | Migrate to adaptive thinking |
| Claude Sonnet 4.6 | ✅ Recommended | ✅ Deprecated but functional | Interleaved mode deprecated |
| Claude Mythos Preview | ✅ Default | ✅ Accepted | thinking: {type: "disabled"} not supported; display defaults to "omitted" |
How to Use Extended Thinking (Code Examples)
Adaptive Thinking with Effort Parameter
This is the recommended approach for Claude Opus 4.7 and all newer models.
Pythonimport anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={
"type": "adaptive",
"effort": "high" # Options: "low", "medium", "high"
},
messages=[
{
"role": "user",
"content": "Prove that there are infinitely many prime numbers congruent to 3 mod 4."
}
]
)
for block in response.content:
if block.type == "thinking":
print(f"Thinking: {block.thinking}")
elif block.type == "text":
print(f"Answer: {block.text}")
TypeScript
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 16000,
thinking: {
type: 'adaptive',
effort: 'high'
},
messages: [
{
role: 'user',
content: 'Prove that there are infinitely many prime numbers congruent to 3 mod 4.'
}
]
});
for (const block of response.content) {
if (block.type === 'thinking') {
console.log(Thinking: ${block.thinking});
} else if (block.type === 'text') {
console.log(Answer: ${block.text});
}
}
Manual Extended Thinking (Legacy)
Use this only for older models that still support it.
Pythonimport anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Maximum tokens for thinking
},
messages=[
{
"role": "user",
"content": "Design a distributed key-value store with strong consistency."
}
]
)
for block in response.content:
if block.type == "thinking":
print(f"Thinking: {block.thinking}")
elif block.type == "text":
print(f"Answer: {block.text}")
Understanding the Effort Parameter
The effort parameter in adaptive thinking controls how much reasoning Claude applies:
low: Minimal reasoning. Best for simple tasks where you want fast responses.medium: Balanced reasoning. A good default for most tasks.high: Maximum reasoning. Use for complex problems like mathematical proofs, multi-step planning, or deep analysis.
Task Budgets (Beta)
For advanced control, you can combine adaptive thinking with a task budget—a maximum number of tokens Claude can use for the entire response (thinking + text). This is useful when you need to cap total latency or cost.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=32000,
thinking={
"type": "adaptive",
"effort": "high",
"budget_tokens": 20000 # Max tokens for thinking only
},
messages=[...]
)
Note: budget_tokens in adaptive mode sets a cap on thinking tokens, while max_tokens caps the total response (thinking + text).
Fast Mode (Research Preview)
Fast mode is an experimental feature that reduces thinking time for faster responses. It's available as a research preview and may change. To enable it:
thinking={
"type": "adaptive",
"effort": "medium",
"fast_mode": True
}
Use fast mode when you need quicker answers and can accept slightly reduced reasoning depth.
Best Practices
- Use adaptive thinking for new projects. Manual mode is deprecated and will be removed in future model releases.
- Start with
effort: "medium"and adjust up or down based on task complexity. - Set
max_tokensgenerously when using extended thinking. The thinking process consumes tokens, so you need headroom for both reasoning and the final answer. - Handle thinking blocks in streaming if you need real-time output. Extended thinking works with streaming—thinking blocks stream first, then text blocks.
- Audit thinking content for debugging. The thinking blocks reveal Claude's reasoning, making it easier to spot errors or biases.
- Respect the signature field. If you need to verify that thinking content hasn't been tampered with, use the signature for cryptographic verification.
Common Pitfalls
- Setting
budget_tokenstoo low in manual mode can cause Claude to stop thinking prematurely, leading to incomplete reasoning. - Forgetting to set
max_tokenshigh enough when using extended thinking. Ifmax_tokensis too low, Claude may run out of space before finishing its answer. - Using manual mode on Claude Opus 4.7 will result in a 400 error. Always use adaptive thinking for that model.
- Ignoring the
displayparameter for Claude Mythos Preview. By default, thinking content is omitted; passdisplay: "summarized"to receive summaries.
Key Takeaways
- Extended thinking gives Claude enhanced reasoning capabilities and transparent step-by-step reasoning in API responses.
- Adaptive thinking (
type: "adaptive") with theeffortparameter is the recommended approach for all current Claude models, and required for Claude Opus 4.7. - Manual extended thinking (
type: "enabled"withbudget_tokens) is deprecated on Claude Opus 4.6 and Claude Sonnet 4.6, and not supported on Claude Opus 4.7. - Use the effort parameter (
low,medium,high) to control reasoning depth without worrying about exact token budgets. - Always set
max_tokenshigh enough to accommodate both thinking and the final response, and handle thinking blocks appropriately in your application code.