Mastering Extended Thinking in Claude: A Practical Guide to Adaptive and Manual Reasoning
Learn how to use Claude's extended thinking feature for complex reasoning tasks. Covers adaptive thinking, effort parameters, manual mode, and code examples for the API.
This guide explains how to enable and configure Claude's extended thinking for step-by-step reasoning. You'll learn the difference between adaptive thinking (recommended for Opus 4.7) and manual mode, how to set effort levels, and how to handle thinking blocks in your API responses.
Introduction
Claude's extended thinking feature unlocks a new level of reasoning capability. When enabled, Claude generates internal "thinking" content blocks before producing its final answer. This step-by-step reasoning process allows the model to tackle complex problems—like mathematical proofs, multi-step logic, or intricate code analysis—with greater accuracy and transparency.
Whether you're building a research assistant, a tutoring app, or a debugging tool, understanding how to configure and consume extended thinking is essential. This guide covers everything from basic setup to advanced configuration, including the new adaptive thinking mode and the effort parameter.
How Extended Thinking Works
When extended thinking is enabled, Claude's API response includes one or more thinking content blocks followed by text content blocks. Each thinking block contains the model's internal reasoning and a cryptographic signature for verification.
Here's a simplified example of what the response looks like:
{
"content": [
{
"type": "thinking",
"thinking": "Let me analyze this step by step...",
"signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
},
{
"type": "text",
"text": "Based on my analysis..."
}
]
}
The thinking blocks are not returned to the user by default—they are meant for developers to inspect or summarize. You can choose to display them, summarize them, or omit them entirely.
Adaptive Thinking vs. Manual Extended Thinking
Claude now supports two modes of extended thinking:
Adaptive Thinking (Recommended)
Adaptive thinking (thinking: {type: "adaptive"}) is the modern approach. Instead of setting a fixed token budget for thinking, you specify an effort level. Claude dynamically allocates thinking tokens based on the complexity of the task.
This mode is required for Claude Opus 4.7 and recommended for Claude Opus 4.6 and Claude Sonnet 4.6. Manual mode is deprecated on those models and will be removed in a future release.
Manual Extended Thinking (Legacy)
Manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) lets you set a fixed number of tokens for the thinking process. This is still supported on most current models except Claude Opus 4.7, where it returns a 400 error. Use adaptive thinking instead.
Supported Models
| Model | Adaptive Thinking | Manual Thinking | Notes |
|---|---|---|---|
| Claude Opus 4.7 | ✅ Required | ❌ Returns 400 | Use effort parameter |
| Claude Opus 4.6 | ✅ Recommended | ✅ Deprecated | Manual still works but will be removed |
| Claude Sonnet 4.6 | ✅ Recommended | ✅ Deprecated | Interleaved mode deprecated |
| Claude Mythos Preview | ✅ Default | ✅ Accepted | disabled not supported; display defaults to omitted |
How to Use Extended Thinking in the API
Basic Setup with Adaptive Thinking (Python)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={
"type": "adaptive",
"effort": "high" # Options: "low", "medium", "high"
},
messages=[
{
"role": "user",
"content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
}
]
)
Process the response
for block in response.content:
if block.type == "thinking":
print(f"Thinking: {block.thinking}")
elif block.type == "text":
print(f"Answer: {block.text}")
Basic Setup with Manual Thinking (Python)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[
{
"role": "user",
"content": "Explain the P vs NP problem in simple terms."
}
]
)
Using Adaptive Thinking with TypeScript
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 16000,
thinking: {
type: 'adaptive',
effort: 'high'
},
messages: [
{
role: 'user',
content: 'Design a distributed caching system with eventual consistency.'
}
]
});
for (const block of response.content) {
if (block.type === 'thinking') {
console.log('Thinking:', block.thinking);
} else if (block.type === 'text') {
console.log('Answer:', block.text);
}
}
The Effort Parameter
The effort parameter controls how much thinking Claude invests in a task. It accepts three values:
low: Minimal thinking, faster responses, suitable for simple tasks.medium: Balanced thinking, good for most use cases.high: Maximum reasoning depth, best for complex problems like mathematical proofs, code generation, or multi-step analysis.
Task Budgets (Beta)
For advanced control, you can set a task budget alongside adaptive thinking. This limits the total tokens Claude can use for both thinking and the final response. It's useful when you need predictable costs or response sizes.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={
"type": "adaptive",
"effort": "high",
"task_budget": 12000 # Total tokens for thinking + response
},
messages=[...]
)
Fast Mode (Beta Research Preview)
Fast mode is an experimental feature that reduces thinking time for simpler tasks while maintaining quality. It's ideal for latency-sensitive applications where you still want some reasoning.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={
"type": "adaptive",
"effort": "medium",
"fast_mode": True
},
messages=[...]
)
Handling Thinking Blocks in Your Application
When you receive a response with thinking blocks, you have several options:
- Display the thinking: Useful for debugging or educational apps where you want users to see the reasoning.
- Summarize the thinking: Use Claude to generate a concise summary of the thinking block before showing it to the user.
- Omit the thinking: Only show the final
textblock to the user.
response = client.messages.create(
model="claude-mythos-preview",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 8000,
"display": "summarized" # Options: "omitted" (default), "summarized"
},
messages=[...]
)
Best Practices
- Use adaptive thinking for new projects: It's the future-proof choice and works best with Claude Opus 4.7.
- Set effort based on task complexity: Use
lowfor simple Q&A,highfor complex reasoning. - Monitor token usage: Thinking tokens count toward your
max_tokenslimit. Setmax_tokenshigh enough to accommodate both thinking and the final response. - Handle streaming gracefully: When streaming, thinking blocks arrive before text blocks. Ensure your UI can handle this ordering.
- Test with different effort levels: Run your prompts with
low,medium, andhigheffort to find the sweet spot between quality and speed.
Key Takeaways
- Adaptive thinking with the
effortparameter is the recommended way to use extended thinking on Claude Opus 4.7 and newer models. - Manual extended thinking (
budget_tokens) is deprecated on Opus 4.6 and Sonnet 4.6, and not supported on Opus 4.7. - The
effortparameter acceptslow,medium, orhighvalues, giving you fine-grained control over reasoning depth. - Thinking blocks appear before text blocks in the API response; you can display, summarize, or omit them.
- Use task budgets and fast mode for advanced control over token usage and latency.