Mastering Extended Thinking in Claude: Adaptive Thinking, Effort Control, and Best Practices
Learn how to use Claude's extended thinking feature for complex reasoning tasks. Covers adaptive thinking, effort parameter, manual mode, and code examples for API integration.
You'll learn how to enable and configure Claude's extended thinking for complex reasoning, including adaptive thinking mode, the effort parameter, manual budget_tokens, and how to handle thinking blocks in API responses.
Introduction
Claude's extended thinking feature unlocks enhanced reasoning capabilities for complex tasks. When enabled, Claude produces an internal chain-of-thought before delivering its final answer, and the API can return this reasoning as transparent content blocks. This guide covers everything you need to know to implement extended thinking effectively—from basic setup to advanced configuration with adaptive thinking and effort control.
How Extended Thinking Works
When extended thinking is turned on, Claude generates thinking content blocks that contain its step-by-step reasoning. These blocks appear in the API response before the final text content blocks. The model uses its own reasoning to refine and validate its final answer, leading to more accurate and well-reasoned outputs.
Here's the default response structure:
{
"content": [
{
"type": "thinking",
"thinking": "Let me analyze this step by step...",
"signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
},
{
"type": "text",
"text": "Based on my analysis..."
}
]
}
The signature field is used for verification and is required when streaming.
Supported Models and Modes
Extended thinking behavior varies by Claude model version. Here's a quick reference:
| Model | Recommended Mode | Manual Mode | Notes |
|---|---|---|---|
| Claude Opus 4.7 | adaptive with effort | ❌ Returns 400 error | Use adaptive thinking only |
| Claude Opus 4.6 | adaptive | enabled (deprecated) | Manual mode still works but will be removed |
| Claude Sonnet 4.6 | adaptive | enabled (deprecated) | Manual mode with interleaved still works |
| Claude Mythos Preview | adaptive (default) | enabled accepted | disabled not supported; display defaults to "omitted" |
Important: For Claude Opus 4.7, manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) is no longer supported and returns a 400 error. You must use adaptive thinking with the effort parameter.
Adaptive Thinking (Recommended)
Adaptive thinking lets Claude dynamically decide how much reasoning to use based on the complexity of the task. This is the recommended approach for all current models.
Basic Adaptive Thinking
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive"},
messages=[
{
"role": "user",
"content": "Prove that the square root of 2 is irrational."
}
]
)
for block in response.content:
if block.type == "thinking":
print(f"Thinking: {block.thinking}")
elif block.type == "text":
print(f"Answer: {block.text}")
Using the Effort Parameter
The effort parameter gives you fine-grained control over how much reasoning Claude applies. It accepts values on a scale from 0.0 to 1.0, where higher values encourage deeper reasoning.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={
"type": "adaptive",
"effort": 0.8 # High effort for complex tasks
},
messages=[
{
"role": "user",
"content": "Design a distributed caching system that handles cache invalidation across 100 nodes."
}
]
)
When to adjust effort:
- Low effort (0.1–0.3): Simple factual queries, quick lookups, straightforward translations
- Medium effort (0.4–0.6): Standard reasoning tasks, code generation, analysis
- High effort (0.7–1.0): Complex mathematical proofs, multi-step planning, deep research
Manual Extended Thinking (Legacy)
For models that still support it (Claude Opus 4.6, Sonnet 4.6, Mythos), you can manually set a token budget for thinking:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Claude will use up to 10k tokens for thinking
},
messages=[
{
"role": "user",
"content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
}
]
)
Note:budget_tokensmust be less thanmax_tokens. Claude will use at mostbudget_tokensfor thinking, and the remaining tokens for the final response.
Streaming with Extended Thinking
When streaming, thinking blocks are delivered as separate events. Here's how to handle them:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive", "effort": 0.7},
messages=[
{
"role": "user",
"content": "Explain quantum entanglement in simple terms."
}
]
) as stream:
for event in stream:
if event.type == "content_block_delta" and event.delta.type == "thinking_delta":
print(event.delta.thinking, end="", flush=True)
elif event.type == "content_block_delta" and event.delta.type == "text_delta":
print(event.delta.text, end="", flush=True)
Task Budgets (Beta)
For advanced use cases, you can set a task budget that limits the total tokens Claude can use across both thinking and final response. This is useful for controlling costs while still allowing deep reasoning.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=32000,
thinking={
"type": "adaptive",
"effort": 0.9,
"task_budget_tokens": 20000 # Total tokens for thinking + response
},
messages=[...]
)
Fast Mode (Research Preview)
Fast mode is an experimental feature that reduces thinking time for simpler tasks while maintaining reasoning quality. Enable it by setting fast_mode: true in the thinking configuration:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={
"type": "adaptive",
"effort": 0.5,
"fast_mode": True # Research preview
},
messages=[...]
)
Caution: Fast mode is a research preview. It may produce lower quality reasoning on complex tasks. Test thoroughly before using in production.
Best Practices
1. Choose the Right Effort Level
Start with effort: 0.5 and adjust based on task complexity. For simple Q&A, lower effort saves tokens. For multi-step reasoning, higher effort yields better results.
2. Set Appropriate max_tokens
Always set max_tokens high enough to accommodate both thinking and the final response. A good rule of thumb: max_tokens = budget_tokens 1.5 (or max_tokens = effort 20000 for adaptive mode).
3. Handle Thinking Blocks in Your Application
When displaying responses to users, you may want to:
- Show thinking blocks in a collapsible UI element
- Use them for debugging or transparency
- Omit them for a clean user experience
4. Use Adaptive Thinking for New Projects
Manual extended thinking is deprecated on most models. Always prefer thinking: {"type": "adaptive"} for future-proof code.
5. Combine with Structured Outputs
Extended thinking works well with structured outputs (JSON mode). Claude can reason through complex data transformations before outputting structured results.
Common Pitfalls
- Exceeding budget_tokens: If
budget_tokens>=max_tokens, the API returns an error. Always leave headroom for the final response. - Using manual mode on Opus 4.7: This returns a 400 error. Switch to adaptive thinking.
- Ignoring signatures: When streaming, always validate signatures if you need to verify the integrity of thinking blocks.
Key Takeaways
- Adaptive thinking (
thinking: {"type": "adaptive"}) is the recommended approach for all current Claude models, especially Opus 4.7 where manual mode is no longer supported. - The effort parameter (0.0–1.0) lets you control reasoning depth—use higher values for complex tasks and lower values for simple queries.
- Manual extended thinking with
budget_tokensis deprecated on Opus 4.6 and Sonnet 4.6, and removed on Opus 4.7. Migrate to adaptive thinking. - Streaming works seamlessly with extended thinking—handle
thinking_deltaandtext_deltaevents separately. - Task budgets and fast mode are experimental features that give you additional control over token usage and response speed.