Mastering Extended Thinking in Claude: A Complete Guide to Adaptive Reasoning
Learn how to use Claude's extended thinking capabilities for complex tasks. Covers adaptive thinking, effort parameters, budget tokens, and practical API examples.
This guide explains how to enable and optimize Claude's extended thinking feature for complex reasoning tasks. You'll learn about adaptive thinking, effort parameters, budget tokens, and how to implement them in your API calls.
Introduction
Claude's extended thinking feature is a game-changer for complex reasoning tasks. It gives Claude enhanced capabilities to work through problems step-by-step, showing its internal thought process before delivering a final answer. Whether you're building a research assistant, a code analysis tool, or a complex decision-making system, understanding extended thinking is essential.
This guide covers everything you need to know: from the basics of enabling extended thinking to advanced configuration with adaptive thinking and effort parameters.
How Extended Thinking Works
When extended thinking is enabled, Claude creates special thinking content blocks in its response. These blocks contain the model's internal reasoning—its step-by-step analysis before crafting the final answer. The API response includes these thinking blocks followed by the text content blocks.
Here's what a typical response looks like:
{
"content": [
{
"type": "thinking",
"thinking": "Let me analyze this step by step...",
"signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
},
{
"type": "text",
"text": "Based on my analysis..."
}
]
}
The thinking blocks include a cryptographic signature for verification, ensuring the integrity of the reasoning process.
Adaptive Thinking: The Modern Approach
For Claude Opus 4.7 and later models, Anthropic has introduced adaptive thinking—a smarter way to manage reasoning. Instead of manually setting a fixed token budget, you use the effort parameter to tell Claude how much reasoning effort to apply.
Effort Parameter
The effort parameter accepts three values:
"low": Minimal reasoning, suitable for simple tasks"medium": Balanced reasoning, good for most complex tasks"high": Maximum reasoning, ideal for very complex problems
Task Budgets (Beta)
For finer control, you can combine adaptive thinking with a task_budget parameter. This lets you specify a maximum number of tokens Claude should use for thinking, while still allowing adaptive allocation within that budget.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=32000,
thinking={
"type": "adaptive",
"effort": "high",
"task_budget": 16000 # Optional: max thinking tokens
},
messages=[
{"role": "user", "content": "Analyze the implications of quantum computing on cryptography"}
]
)
Fast Mode (Research Preview)
Fast mode is an experimental feature that reduces thinking time for simpler queries. It's useful when you need quick responses but still want some reasoning capability. Enable it by setting fast_mode: true in the thinking configuration.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=32000,
thinking={
"type": "adaptive",
"effort": "medium",
"fast_mode": True
},
messages=[
{"role": "user", "content": "What's 15% of 200?"}
]
)
Manual Extended Thinking (Legacy)
For older models (Claude Opus 4.6, Claude Sonnet 4.6, and earlier), you can still use manual extended thinking with a fixed token budget. However, this approach is deprecated and will be removed in future model releases.
# Legacy approach - not recommended for new projects
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 8000 # Fixed thinking budget
},
messages=[
{"role": "user", "content": "Solve this complex math problem..."}
]
)
Model-Specific Behavior
Different Claude models handle extended thinking differently:
| Model | Adaptive Thinking | Manual Thinking | Notes |
|---|---|---|---|
| Claude Opus 4.7+ | ✅ Recommended | ❌ Not supported | Use effort parameter |
| Claude Mythos Preview | ✅ Default | ✅ Accepted | thinking: {"type": "disabled"} not supported |
| Claude Opus 4.6 | ✅ Recommended | ✅ Deprecated | Adaptive preferred |
| Claude Sonnet 4.6 | ✅ Recommended | ✅ Deprecated | Interleaved mode available |
display: "omitted" instead of type: "disabled". Use display: "summarized" to receive summaries of the thinking process.
Practical API Examples
Basic Adaptive Thinking
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=32000,
thinking={"type": "adaptive", "effort": "medium"},
messages=[
{"role": "user", "content": "Explain the theory of relativity in simple terms"}
]
)
Access thinking content
for block in response.content:
if block.type == "thinking":
print(f"Thinking: {block.thinking}")
elif block.type == "text":
print(f"Answer: {block.text}")
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 32000,
thinking: { type: 'adaptive', effort: 'high' },
messages: [
{ role: 'user', content: 'Debug this code and explain the fix...' }
]
});
for (const block of response.content) {
if (block.type === 'thinking') {
console.log('Thinking:', block.thinking);
} else if (block.type === 'text') {
console.log('Answer:', block.text);
}
}
Using with Streaming
Extended thinking works seamlessly with streaming responses. The thinking blocks are streamed before the text blocks:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=32000,
thinking={"type": "adaptive", "effort": "high"},
messages=[
{"role": "user", "content": "Write a detailed analysis of..."}
]
) as stream:
for event in stream:
if event.type == "content_block_delta" and event.delta.type == "thinking_delta":
print(event.delta.thinking, end="")
elif event.type == "content_block_delta" and event.delta.type == "text_delta":
print(event.delta.text, end="")
Best Practices
- Choose the right effort level: Start with
"medium"for most tasks. Use"high"only for very complex problems that require deep reasoning. Use"low"for simple queries where speed matters.
- Set appropriate max_tokens: Extended thinking consumes tokens from your
max_tokensbudget. Ensure you allocate enough tokens for both thinking and the final response. A good rule of thumb: setmax_tokensto at least 2x your expected thinking budget.
- Use task budgets for cost control: If you're concerned about token usage, set a
task_budgetto cap the thinking tokens while still allowing adaptive allocation.
- Handle thinking content appropriately: The thinking blocks contain the model's internal reasoning. You can display them to users for transparency, or omit them for a cleaner UI.
- Combine with other features: Extended thinking works well with tools, structured outputs, and citations. For example, you can have Claude think through a problem before calling a tool or generating a structured response.
Troubleshooting
- 400 Error on Opus 4.7+: If you get a 400 error, you're likely using manual thinking (
type: "enabled") on a model that only supports adaptive thinking. Switch totype: "adaptive"with theeffortparameter.
- Thinking content missing: If you're not seeing thinking blocks in the response, check that you've enabled extended thinking in your API call. Also, some models (like Mythos Preview) may omit thinking content by default.
- Token limits: If Claude stops thinking prematurely, increase your
max_tokensortask_budgetvalues.
Key Takeaways
- Extended thinking enhances Claude's reasoning by allowing step-by-step analysis before generating the final answer, with transparent thinking blocks in the response.
- Adaptive thinking is the modern approach for Claude Opus 4.7+ models. Use the
effortparameter (low,medium,high) instead of manual token budgets. - Manual thinking is deprecated on newer models and will be removed. Migrate your code to adaptive thinking for future compatibility.
- Combine with streaming and tools for powerful applications. Extended thinking works seamlessly with streaming responses and tool use.
- Control costs with task budgets and appropriate effort levels to balance reasoning depth with token consumption.