Mastering Extended Thinking in Claude: A Practical Guide to Enhanced Reasoning
Learn how to use Claude's extended thinking feature for complex tasks. Covers adaptive thinking, manual mode, effort parameters, and code examples for API integration.
This guide explains how to enable and configure Claude's extended thinking for step-by-step reasoning. You'll learn about adaptive thinking (recommended for Opus 4.7+), manual mode, effort parameters, and how to handle thinking blocks in API responses.
Introduction
Claude's extended thinking feature unlocks a new level of reasoning capability. When enabled, Claude generates internal "thinking blocks" — step-by-step reasoning that it uses to craft more accurate, well-reasoned final answers. This is especially valuable for complex tasks like mathematical proofs, multi-step analysis, code debugging, and strategic planning.
In this guide, you'll learn:
- What extended thinking is and how it works
- The difference between adaptive thinking and manual mode
- How to configure extended thinking for different Claude models
- Practical code examples for the Messages API
- Best practices for using thinking blocks in your applications
How Extended Thinking Works
When extended thinking is enabled, Claude's API response includes special thinking content blocks before the final text content block. These thinking blocks contain Claude's internal reasoning — the "scratchpad" it uses to work through a problem.
Here's what a typical response looks like:
{
"content": [
{
"type": "thinking",
"thinking": "Let me analyze this step by step...",
"signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
},
{
"type": "text",
"text": "Based on my analysis..."
}
]
}
The signature field is used for verification purposes and is required when streaming responses.
Adaptive Thinking vs. Manual Extended Thinking
Adaptive Thinking (Recommended for Opus 4.7+)
For Claude Opus 4.7 and later models, Anthropic has introduced adaptive thinking. Instead of manually setting a token budget, you use the effort parameter to control how much Claude should "think" about a problem.
- No need to guess a token budget
- Claude dynamically allocates thinking tokens based on task complexity
- Simpler API calls
Manual Extended Thinking (Deprecated for Opus 4.7+)
Manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) is the older approach. It's still supported on Claude Opus 4.6 and Claude Sonnet 4.6, but it's deprecated and will be removed in future releases. Do not use it on Opus 4.7 or later — it returns a 400 error.
Model Compatibility
| Model | Recommended Mode | Manual Mode Support |
|---|---|---|
| Claude Opus 4.7+ | Adaptive thinking (type: "adaptive") | ❌ Returns 400 error |
| Claude Opus 4.6 | Adaptive thinking (recommended) | ✅ Deprecated but functional |
| Claude Sonnet 4.6 | Adaptive thinking (recommended) | ✅ Deprecated but functional (interleaved mode) |
| Claude Mythos Preview | Adaptive thinking (default) | ✅ type: "enabled" accepted; type: "disabled" not supported |
How to Use Extended Thinking in the API
Using Adaptive Thinking (Opus 4.7+)
With adaptive thinking, you specify an effort level. The effort parameter controls how much reasoning Claude applies. Higher effort means more thinking tokens, which can improve accuracy on very complex tasks but increases latency and cost.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=20000,
thinking={
"type": "adaptive",
"effort": "high" # Options: "low", "medium", "high"
},
messages=[
{
"role": "user",
"content": "Prove that there are infinitely many prime numbers of the form 4k+3."
}
]
)
Print the thinking content
for block in response.content:
if block.type == "thinking":
print("Thinking:", block.thinking)
elif block.type == "text":
print("Answer:", block.text)
TypeScript Example:
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 20000,
thinking: {
type: 'adaptive',
effort: 'high'
},
messages: [
{
role: 'user',
content: 'Prove that there are infinitely many prime numbers of the form 4k+3.'
}
]
});
for (const block of response.content) {
if (block.type === 'thinking') {
console.log('Thinking:', block.thinking);
} else if (block.type === 'text') {
console.log('Answer:', block.text);
}
}
Using Manual Extended Thinking (Opus 4.6 / Sonnet 4.6)
If you're using an older model, you can still use manual mode with a budget_tokens parameter:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Claude can use up to 10,000 tokens for thinking
},
messages=[
{
"role": "user",
"content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
}
]
)
for block in response.content:
if block.type == "thinking":
print("Thinking:", block.thinking)
elif block.type == "text":
print("Answer:", block.text)
Important: The budget_tokens must be less than max_tokens. A good rule of thumb is to set budget_tokens to about 60-80% of max_tokens.
Streaming with Extended Thinking
When streaming, thinking blocks appear before text blocks. You need to handle the signature field for verification:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=20000,
thinking={
"type": "adaptive",
"effort": "medium"
},
messages=[
{
"role": "user",
"content": "Explain the Riemann Hypothesis in simple terms."
}
]
) as stream:
for event in stream:
if event.type == "content_block_delta" and event.delta.type == "thinking_delta":
print(event.delta.thinking, end="")
elif event.type == "content_block_delta" and event.delta.type == "text_delta":
print(event.delta.text, end="")
Best Practices
1. Choose the Right Effort Level
- Low effort: Use for simple tasks where you want faster responses and lower cost. Good for straightforward Q&A.
- Medium effort: A balanced choice for most complex tasks. Recommended as a starting point.
- High effort: Use for very complex reasoning tasks like mathematical proofs, multi-step analysis, or strategic planning. Expect higher latency and cost.
2. Set Appropriate max_tokens
Extended thinking consumes tokens from your max_tokens budget. If you set max_tokens too low, Claude may run out of tokens before completing its reasoning. For complex tasks, start with max_tokens of 16000-32000.
3. Handle Thinking Blocks in Your Application
If you're displaying Claude's response to users, you have options:
- Show thinking blocks: Great for educational or debugging contexts
- Hide thinking blocks: Show only the final text response for a cleaner UX
- Summarize thinking: On Claude Mythos Preview, you can pass
display: "summarized"to receive summaries instead of raw thinking
4. Use with Structured Outputs
Extended thinking works well with structured outputs (JSON mode). Claude can reason about the structure before generating the final JSON:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={
"type": "adaptive",
"effort": "medium"
},
messages=[
{
"role": "user",
"content": "Extract all dates, names, and amounts from this invoice and return as JSON."
}
]
)
Common Pitfalls
- Using manual mode on Opus 4.7+: This returns a 400 error. Always use adaptive thinking for new models.
- Setting budget_tokens too high: If
budget_tokensexceedsmax_tokens, the API will reject the request. - Ignoring the signature: When streaming, always capture the signature for verification purposes.
- Not accounting for token usage: Extended thinking consumes tokens from your rate limits and billing. Monitor your usage carefully.
Key Takeaways
- Adaptive thinking is the future: For Claude Opus 4.7 and later, use
thinking: {type: "adaptive", effort: "low"|"medium"|"high"}instead of manual budget tokens. - Manual mode is deprecated: It still works on Opus 4.6 and Sonnet 4.6 but will be removed. Migrate to adaptive thinking as soon as possible.
- Extended thinking improves complex reasoning: Use it for math, code analysis, multi-step logic, and strategic planning tasks.
- Handle thinking blocks appropriately: You can display, hide, or summarize Claude's internal reasoning depending on your use case.
- Monitor token usage: Extended thinking consumes tokens from your
max_tokensbudget and affects rate limits and costs.