Guide2026-05-01

Mastering Extended Thinking in Claude: A Complete Guide to Adaptive and Manual Reasoning

Learn how to enable and optimize Claude's extended thinking feature for complex tasks. Covers adaptive thinking, manual mode, effort parameters, and practical API examples.

Quick Answer

This guide teaches you how to use Claude's extended thinking feature to enhance reasoning on complex tasks. You'll learn about adaptive thinking (recommended for Claude Opus 4.7+), manual mode, effort parameters, and how to implement them in your API calls with practical code examples.

extended thinkingClaude APIadaptive thinkingreasoningClaude Opus

Introduction

Claude's extended thinking feature unlocks a new level of reasoning capability, allowing the model to "think through" complex problems step-by-step before delivering a final answer. Whether you're building a sophisticated code analysis tool, a multi-step reasoning agent, or a deep research assistant, extended thinking gives Claude the cognitive runway it needs to tackle your hardest tasks.

This guide covers everything you need to know: from the latest adaptive thinking mode (required for Claude Opus 4.7 and later) to the legacy manual mode, with practical API examples and best practices.

How Extended Thinking Works

When extended thinking is enabled, Claude generates internal reasoning in special thinking content blocks before producing its final text response. The API response includes both, giving you transparency into the model's thought process.

Here's what a typical response looks like:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

The signature field is used for verification and streaming integrity. The thinking blocks appear before the text blocks in the response.

Adaptive Thinking (Recommended for Claude Opus 4.7+)

Adaptive thinking is the newest and most flexible way to use extended thinking. Instead of manually setting a token budget, you let Claude dynamically decide how much thinking is needed based on the complexity of the task. This is the only supported method for Claude Opus 4.7 and later models.

Using the `effort` Parameter

With adaptive thinking, you control the reasoning depth using the effort parameter. The effort level determines how aggressively Claude allocates thinking tokens:

low: Minimal thinking, best for simple tasks where speed matters.
medium: Balanced reasoning, suitable for most complex tasks.
high: Maximum reasoning depth, ideal for the hardest problems.

Here's how to use it in Python:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={
        "type": "adaptive",
        "effort": "high"  # Options: "low", "medium", "high"
    },
    messages=[
        {
            "role": "user",
            "content": "Design a distributed caching system that handles cache invalidation across 1000 nodes with eventual consistency."
        }
    ]
)
print(response.content)

And in TypeScript:

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 32000,
  thinking: {
    type: 'adaptive',
    effort: 'high'
  },
  messages: [
    {
      role: 'user',
      content: 'Design a distributed caching system that handles cache invalidation across 1000 nodes with eventual consistency.'
    }
  ]
});
console.log(response.content);

Task Budgets (Beta)

For even finer control, you can combine adaptive thinking with a task budget — an optional parameter that sets a maximum token ceiling for thinking. This is useful when you want adaptive behavior but need to cap costs or latency.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={
        "type": "adaptive",
        "effort": "high",
        "task_budget_tokens": 10000  # Optional: cap thinking at 10K tokens
    },
    messages=[...]
)

Fast Mode (Beta Research Preview)

Fast mode is an experimental feature that reduces thinking time for simpler tasks while maintaining quality. It's ideal for latency-sensitive applications where you still want some reasoning.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={
        "type": "adaptive",
        "effort": "medium",
        "fast_mode": True  # Beta: reduces thinking time
    },
    messages=[...]
)

Manual Extended Thinking (Legacy)

Manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) is the original approach where you explicitly set a token budget for thinking. While still functional on Claude Opus 4.6 and Claude Sonnet 4.6, it is deprecated and will be removed in future model releases.

Important: Manual extended thinking is not supported on Claude Opus 4.7 or later models. Attempting to use it will return a 400 error.

How to Use Manual Mode (Legacy)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 8000  # Allocate 8K tokens for thinking
    },
    messages=[
        {
            "role": "user",
            "content": "Explain the P vs NP problem and its implications for cryptography."
        }
    ]
)

Key Rules for Manual Mode

budget_tokens must be less than max_tokens.
The thinking budget is a soft limit — Claude may use slightly more or less.
The remaining tokens (max_tokens - budget_tokens) are reserved for the final text response.

Model-Specific Behavior

Different Claude models handle extended thinking differently. Here's a quick reference:

Model	Adaptive Thinking	Manual Mode	Notes
Claude Opus 4.7+	✅ Required	❌ Returns 400 error	Use `effort` parameter
Claude Mythos Preview	✅ Default	✅ Accepted	`thinking: {"type": "disabled"}` not supported
Claude Opus 4.6	✅ Recommended	✅ Deprecated but functional	Migrate to adaptive
Claude Sonnet 4.6	✅ Recommended	✅ Deprecated but functional	Interleaved mode deprecated
Claude Sonnet 3.7	❌ Not available	✅ Supported	Manual mode only

Best Practices

1. Choose the Right Effort Level

Low effort: Use for straightforward tasks like summarization, simple Q&A, or data extraction.
Medium effort: Ideal for most complex tasks — code generation, analysis, planning.
High effort: Reserve for the hardest problems: mathematical proofs, system design, multi-step reasoning.

2. Set Appropriate `max_tokens`

Extended thinking consumes tokens from your max_tokens budget. With adaptive thinking, Claude will use what it needs, but you should set max_tokens high enough to accommodate both thinking and the final response. A good starting point is 2-3x your expected output size.

3. Handle Thinking Blocks in Streaming

When streaming, thinking blocks appear as separate events. Make sure your client handles them correctly:

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={"type": "adaptive", "effort": "high"},
    messages=[{"role": "user", "content": "Complex question..."}],
    stream=True
)
for event in stream:
    if event.type == "content_block_delta" and event.delta.type == "thinking_delta":
        # Handle thinking tokens (e.g., display as hidden or summarized)
        pass
    elif event.type == "content_block_delta" and event.delta.type == "text_delta":
        # Handle final text output
        print(event.delta.text, end="")

4. Use Display Options (Claude Mythos Preview)

For Claude Mythos Preview, you can control how thinking is returned:

Default ("omitted"): Thinking content is not returned in the response.
"summarized": Returns a summary of the thinking process.

response = client.messages.create(
    model="claude-mythos-preview",
    thinking={
        "type": "adaptive",
        "effort": "high",
        "display": "summarized"  # Get thinking summaries
    },
    messages=[...]
)

Common Pitfalls to Avoid

Using manual mode on Opus 4.7+: This will fail with a 400 error. Always use adaptive thinking.
Setting budget_tokens too low: If the budget is too small, Claude may not have enough room to reason properly, leading to poorer quality answers.
Forgetting to handle streaming events: If you're streaming, make sure your code processes thinking_delta events, not just text_delta.
Ignoring the signature field: For security-sensitive applications, verify the signature to ensure thinking content hasn't been tampered with.

Conclusion

Extended thinking is one of Claude's most powerful features, enabling deep reasoning on complex tasks. With the introduction of adaptive thinking and the effort parameter, you now have fine-grained control over how much Claude thinks, without needing to manually set token budgets.

For new projects, always use adaptive thinking — it's simpler, more efficient, and future-proof. If you're maintaining legacy code on older models, plan your migration to adaptive thinking as soon as possible.

Key Takeaways

Adaptive thinking (thinking: {type: "adaptive"}) is the recommended approach for all current Claude models, and the only supported method for Claude Opus 4.7 and later.
Use the effort parameter (low, medium, high) to control reasoning depth — no need to manually set token budgets.
Manual extended thinking (type: "enabled" with budget_tokens) is deprecated on Claude Opus 4.6 and Sonnet 4.6, and not supported on Opus 4.7+.
Always set max_tokens high enough to accommodate both thinking and the final response — a good rule of thumb is 2-3x your expected output size.
When streaming, handle thinking_delta events separately from text_delta to properly process the thinking blocks.