Mastering Extended Thinking in Claude: A Practical Guide to Adaptive & Manual Reasoning
Learn how to enable and optimize Claude's extended thinking for complex reasoning tasks. Covers adaptive thinking, manual mode, budget tokens, and best practices.
This guide explains how to use extended thinking in Claude API to enhance reasoning for complex tasks. You'll learn about adaptive thinking (recommended for Opus 4.7), manual mode, budget tokens, and how to handle thinking content blocks in responses.
Introduction
Extended thinking is one of Claude's most powerful features for tackling complex, multi-step problems. When enabled, Claude generates internal reasoning steps before producing its final answer, giving you both transparency into its thought process and significantly improved accuracy on tasks like math, code analysis, logic puzzles, and research synthesis.
This guide covers everything you need to know to implement extended thinking effectively: from the basics of enabling it in the API to advanced configuration with adaptive thinking and budget tokens.
How Extended Thinking Works
When extended thinking is turned on, Claude creates special thinking content blocks in the API response. These blocks contain its step-by-step reasoning, followed by the final text content block with the answer.
Here's what a typical response looks like:
{
"content": [
{
"type": "thinking",
"thinking": "Let me analyze this step by step...",
"signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
},
{
"type": "text",
"text": "Based on my analysis..."
}
]
}
The signature field is used for verification and is required when streaming or storing thinking blocks.
Enabling Extended Thinking
For Claude Opus 4.7 (Recommended: Adaptive Thinking)
Claude Opus 4.7 requires adaptive thinking — you cannot use the old manual budget_tokens approach. Adaptive thinking automatically allocates tokens based on task complexity.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=8192,
thinking={"type": "adaptive", "effort": "high"},
messages=[
{"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) from 0 to pi"}
]
)
Access thinking blocks
for block in response.content:
if block.type == "thinking":
print("Thinking:", block.thinking)
elif block.type == "text":
print("Answer:", block.text)
TypeScript example:
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 8192,
thinking: { type: 'adaptive', effort: 'high' },
messages: [
{ role: 'user', content: 'Solve this complex math problem: integrate x^2 * sin(x) from 0 to pi' }
]
});
for (const block of response.content) {
if (block.type === 'thinking') {
console.log('Thinking:', block.thinking);
} else if (block.type === 'text') {
console.log('Answer:', block.text);
}
}
For Claude Opus 4.6 and Claude Sonnet 4.6
Adaptive thinking is recommended for these models too, but manual mode is still functional (though deprecated).
Adaptive mode (recommended):response = client.messages.create(
model="claude-opus-4-6",
max_tokens=8192,
thinking={"type": "adaptive", "effort": "medium"},
messages=[
{"role": "user", "content": "Explain the implications of quantum entanglement on cryptography"}
]
)
Manual mode (deprecated but functional):
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=8192,
thinking={"type": "enabled", "budget_tokens": 4096},
messages=[
{"role": "user", "content": "Explain the implications of quantum entanglement on cryptography"}
]
)
Understanding the effort Parameter
With adaptive thinking, you control reasoning depth using the effort parameter. This replaces the old budget_tokens approach.
| Effort Level | Use Case |
|---|---|
"low" | Simple tasks, quick responses |
"medium" | Balanced reasoning for most tasks |
"high" | Complex problems requiring deep analysis |
Budget Tokens (Manual Mode - Legacy)
If you're using a model that still supports manual mode (Opus 4.6, Sonnet 4.6, Mythos Preview), the budget_tokens parameter sets the maximum tokens Claude can use for reasoning.
budget_tokensmust be less thanmax_tokens- A minimum of 1024 tokens is required
- The budget must be at least 1 token less than
max_tokens
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
thinking={"type": "enabled", "budget_tokens": 4096},
messages=[
{"role": "user", "content": "Write a detailed analysis of the Fermi Paradox"}
]
)
Streaming with Extended Thinking
When streaming, thinking blocks appear before text blocks. You must handle them separately:
stream = client.messages.create(
model="claude-opus-4-7",
max_tokens=8192,
thinking={"type": "adaptive", "effort": "high"},
messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}],
stream=True
)
for event in stream:
if event.type == "content_block_delta" and event.delta.type == "thinking_delta":
print("Thinking:", event.delta.thinking, end="")
elif event.type == "content_block_delta" and event.delta.type == "text_delta":
print("Answer:", event.delta.text, end="")
Best Practices
1. Use Adaptive Thinking for New Models
Always prefer thinking: {type: "adaptive"} for Claude Opus 4.7 and newer models. It's simpler, more efficient, and automatically adjusts reasoning depth.
2. Set Appropriate max_tokens
Extended thinking consumes tokens from your max_tokens budget. If you set max_tokens too low, Claude may run out of space for the final answer. A good rule of thumb: set max_tokens to at least 2x your expected thinking budget.
3. Handle Thinking Blocks in Your Application
If you're displaying responses to users, you may want to:
- Show thinking blocks in a collapsible section
- Stream them in real-time for transparency
- Omit them entirely if you only need the final answer
4. Combine with Structured Outputs
Extended thinking works well with structured outputs. The thinking blocks contain reasoning, while the text block can contain JSON or other structured data.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=8192,
thinking={"type": "adaptive", "effort": "high"},
messages=[
{"role": "user", "content": "Extract key entities from this text and return as JSON"}
]
)
5. Monitor Token Usage
Extended thinking can significantly increase token consumption. Monitor your usage and adjust the effort parameter or budget_tokens accordingly.
Model-Specific Notes
| Model | Recommended Mode | Notes |
|---|---|---|
| Claude Opus 4.7 | Adaptive only | Manual mode returns 400 error |
| Claude Opus 4.6 | Adaptive (manual deprecated) | Manual still works but will be removed |
| Claude Sonnet 4.6 | Adaptive (manual deprecated) | Interleaved mode deprecated |
| Claude Mythos Preview | Adaptive (default) | Cannot disable thinking; display defaults to "omitted" |
Troubleshooting
Error: "budget_tokens not supported on this model"You're using Claude Opus 4.7 with manual mode. Switch to adaptive thinking:
# Wrong (returns 400 error on Opus 4.7)
thinking={"type": "enabled", "budget_tokens": 4096}
Correct
thinking={"type": "adaptive", "effort": "high"}
Error: "budget_tokens must be less than max_tokens"
Reduce your budget_tokens or increase max_tokens. The budget must be at least 1 token less than max_tokens.
Ensure you're using a model that supports extended thinking and that you've correctly set the thinking parameter in your request.
Key Takeaways
- Extended thinking enhances Claude's reasoning by providing step-by-step internal reasoning before the final answer, improving accuracy on complex tasks.
- Use adaptive thinking for Claude Opus 4.7 with the
effortparameter (low,medium,high) — manualbudget_tokensmode is no longer supported on this model. - Handle thinking blocks separately in your application — they appear as
type: "thinking"content blocks before the finaltype: "text"answer. - Monitor token consumption carefully, as extended thinking can significantly increase usage — adjust
effortorbudget_tokensaccordingly. - Streaming works with extended thinking — handle
thinking_deltaandtext_deltaevents separately for real-time display.