Mastering Extended Thinking in Claude: A Guide to Adaptive Thinking and Effort Control
Learn how to enable and optimize Claude's extended thinking for complex reasoning tasks. Covers adaptive thinking, effort parameter, budget tokens, and practical API examples.
You'll learn how to configure Claude's extended thinking mode using the API, including the new adaptive thinking with effort control, manual budget tokens, and how to handle thinking content blocks in responses.
Introduction
Claude’s extended thinking feature unlocks deeper reasoning for complex tasks—think mathematical proofs, multi-step logic puzzles, or intricate code analysis. Instead of just giving you an answer, Claude shows its step-by-step thought process before delivering the final response. This transparency helps you verify reasoning, debug prompts, and build trust in the output.
With the latest Claude models (Opus 4.7, Sonnet 4.6, and Opus 4.6), Anthropic has introduced adaptive thinking—a smarter way to allocate thinking tokens. This guide covers everything you need to know: from enabling extended thinking in the API to choosing between adaptive and manual modes, and handling the response format.
How Extended Thinking Works
When extended thinking is enabled, Claude generates special thinking content blocks alongside the usual text blocks. The thinking blocks contain internal reasoning, and the final text block contains the answer. This two-part structure gives you insight into how Claude arrived at its conclusion.
Here’s what a typical response looks like:
{
"content": [
{
"type": "thinking",
"thinking": "Let me analyze this step by step...",
"signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
},
{
"type": "text",
"text": "Based on my analysis..."
}
]
}
The signature field is used for verification and is required when streaming.
Adaptive Thinking vs. Manual Extended Thinking
Anthropic now recommends adaptive thinking for all current models. Here’s the breakdown:
| Model | Recommended Mode | Manual Mode Status |
|---|---|---|
| Claude Opus 4.7 | adaptive with effort | Not supported (returns 400 error) |
| Claude Opus 4.6 | adaptive | Deprecated but functional |
| Claude Sonnet 4.6 | adaptive | Deprecated but functional (interleaved) |
| Claude Mythos Preview | adaptive (default) | Accepted, but disabled not supported |
Adaptive Thinking (Recommended)
Adaptive thinking lets Claude decide how many tokens to spend on reasoning based on the task complexity. You control the effort level, which sets the maximum thinking budget relative to the model’s capability.
API Example (Python):import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=32000,
thinking={
"type": "adaptive",
"effort": "high" # Options: "low", "medium", "high"
},
messages=[
{
"role": "user",
"content": "Prove that there are infinitely many prime numbers congruent to 3 mod 4."
}
]
)
for block in response.content:
if block.type == "thinking":
print(f"Thinking: {block.thinking}")
elif block.type == "text":
print(f"Answer: {block.text}")
API Example (TypeScript):
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 32000,
thinking: {
type: 'adaptive',
effort: 'high'
},
messages: [
{
role: 'user',
content: 'Prove that there are infinitely many prime numbers congruent to 3 mod 4.'
}
]
});
for (const block of response.content) {
if (block.type === 'thinking') {
console.log(Thinking: ${block.thinking});
} else if (block.type === 'text') {
console.log(Answer: ${block.text});
}
}
Manual Extended Thinking (Legacy)
Manual mode lets you set a fixed budget_tokens—the maximum number of tokens Claude can use for thinking. This is still supported on Opus 4.6 and Sonnet 4.6 but is deprecated.
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Must be less than max_tokens
},
messages=[
{
"role": "user",
"content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
}
]
)
Important:budget_tokensmust be less thanmax_tokens. The difference (max_tokens - budget_tokens) is reserved for the final text response.
Effort Parameter (Adaptive Mode)
The effort parameter in adaptive thinking controls how much reasoning Claude applies. It accepts three values:
low: Minimal thinking, faster responses, suitable for simple tasks.medium: Balanced reasoning, good for most use cases.high: Maximum reasoning depth, best for complex problems (e.g., mathematical proofs, multi-step analysis).
Task Budgets (Beta)
For advanced users, the task budget feature lets you set a total token budget for the entire thinking + response. This is useful when you want to cap costs while still using adaptive thinking.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=64000,
thinking={
"type": "adaptive",
"effort": "high",
"task_budget_tokens": 50000 # Total budget for thinking + text
},
messages=[...]
)
Fast Mode (Beta Research Preview)
Fast mode reduces thinking time for quicker responses, at the cost of some reasoning depth. It’s ideal for interactive applications where latency matters.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=32000,
thinking={
"type": "adaptive",
"effort": "high",
"fast_mode": True
},
messages=[...]
)
Handling Thinking Blocks in Streaming
When streaming, thinking blocks appear as separate events. You need to handle them appropriately:
stream = client.messages.create(
model="claude-opus-4-7",
max_tokens=32000,
thinking={"type": "adaptive", "effort": "high"},
messages=[{"role": "user", "content": "Explain quantum entanglement."}],
stream=True
)
for event in stream:
if event.type == "content_block_start" and event.content_block.type == "thinking":
print("\n[Thinking started]")
elif event.type == "content_block_delta" and event.delta.type == "thinking_delta":
print(event.delta.thinking, end="")
elif event.type == "content_block_stop" and event.content_block.type == "thinking":
print("\n[Thinking ended]")
elif event.type == "content_block_start" and event.content_block.type == "text":
print("\n[Response]")
elif event.type == "content_block_delta" and event.delta.type == "text_delta":
print(event.delta.text, end="")
Best Practices
- Start with adaptive thinking – It’s the recommended mode for all current models and simplifies token management.
- Use
effort: "high"for complex reasoning – For tasks like code review, math proofs, or multi-step analysis, high effort yields better results. - Set
max_tokensgenerously – Thinking can consume many tokens. A good rule of thumb: setmax_tokensto at least 2x your expected thinking budget. - Monitor token usage – Extended thinking increases token consumption. Use the
usagefield in the response to track costs. - Handle thinking blocks in streaming – If you stream responses, ensure your client correctly processes
thinking_deltaevents. - Avoid disabling thinking on Mythos – The Mythos Preview model requires thinking; you cannot set
thinking: {type: "disabled"}.
Key Takeaways
- Adaptive thinking is the new standard for Claude Opus 4.7, Sonnet 4.6, and Opus 4.6—use
thinking: {type: "adaptive", effort: "..."}instead of manual budget tokens. - Effort parameter (
low,medium,high) controls reasoning depth; choose based on task complexity to balance quality and cost. - Manual extended thinking (
type: "enabled"withbudget_tokens) is deprecated on Opus 4.6 and Sonnet 4.6, and not supported on Opus 4.7. - Task budgets and fast mode are beta features that give you finer control over token usage and latency.
- Streaming requires special handling for thinking blocks—use
thinking_deltaevents to display reasoning in real time.