Guide2026-05-05

Mastering Claude's Extended Thinking: A Complete Guide to Adaptive Reasoning

Learn how to use Claude's extended thinking capabilities for complex tasks. Covers adaptive thinking, effort parameters, manual mode, and practical API examples.

Quick Answer

This guide teaches you how to enable and optimize Claude's extended thinking feature for complex reasoning tasks. You'll learn about adaptive thinking with the effort parameter, manual mode for legacy models, and how to handle thinking blocks in API responses.

Claude APIExtended ThinkingAdaptive ThinkingReasoningAI Development

Mastering Claude's Extended Thinking: A Complete Guide to Adaptive Reasoning

Claude's extended thinking capability is one of its most powerful features—it allows the model to engage in deep, step-by-step reasoning before delivering a final answer. Whether you're solving complex mathematical proofs, analyzing intricate codebases, or conducting multi-step research, extended thinking gives Claude the cognitive runway it needs to produce more accurate and thoughtful responses.

In this guide, you'll learn how to configure and use extended thinking effectively, understand the differences between adaptive and manual modes, and see practical code examples for the Claude API.

Understanding Extended Thinking

Extended thinking works by creating thinking content blocks in Claude's response. These blocks contain the model's internal reasoning process, followed by the final text response. This transparency allows you to see how Claude arrived at its conclusions—not just what it concluded.

Here's what a typical extended thinking response looks like:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

The thinking block includes a cryptographic signature that verifies the integrity of the thinking content—useful for auditing and trust verification.

Adaptive Thinking vs. Manual Extended Thinking

Claude offers two modes for extended thinking, and the right choice depends on your model version and use case.

Adaptive Thinking (Recommended for Claude Opus 4.7+)

Adaptive thinking is the modern approach, introduced with Claude Opus 4.7 and later models. Instead of setting a fixed token budget, you use the effort parameter to tell Claude how much reasoning effort to apply.

Key characteristics:

Uses thinking: {type: "adaptive"}
Requires the effort parameter (values: "low", "medium", "high")
Claude dynamically allocates thinking tokens based on task complexity
Simpler tasks use fewer tokens; complex tasks get more

Manual Extended Thinking (Legacy)

Manual extended thinking uses a fixed token budget:

{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 10000
  }
}

Important: Manual mode is no longer supported on Claude Opus 4.7 and later models (returns a 400 error). It remains functional but deprecated on Claude Opus 4.6 and Claude Sonnet 4.6.

Model Compatibility Matrix

Model	Adaptive Thinking	Manual Mode	Notes
Claude Opus 4.7+	✅ Required	❌ Returns 400 error	Use `effort` parameter
Claude Mythos Preview	✅ Default	✅ Accepted	`disabled` not supported; use `display: "summarized"` for summaries
Claude Opus 4.6	✅ Recommended	✅ Deprecated	Will be removed in future
Claude Sonnet 4.6	✅ Recommended	✅ Deprecated	Uses interleaved mode
Claude Sonnet 3.7	❌	✅ Supported	Legacy behavior

How to Use Extended Thinking in the API

Basic Setup with Adaptive Thinking

Here's how to enable adaptive thinking with the effort parameter:

Python Example:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={
        "type": "adaptive",
        "effort": "high"
    },
    messages=[
        {
            "role": "user",
            "content": "Prove that there are infinitely many prime numbers congruent to 3 mod 4."
        }
    ]
)
Process the response
for block in response.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:200]}...")
        print(f"Signature: {block.signature}")
    elif block.type == "text":
        print(f"Final answer: {block.text}")

TypeScript Example:

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 32000,
  thinking: {
    type: 'adaptive',
    effort: 'high'
  },
  messages: [
    {
      role: 'user',
      content: 'Prove that there are infinitely many prime numbers congruent to 3 mod 4.'
    }
  ]
});
for (const block of response.content) {
  if (block.type === 'thinking') {
    console.log(Thinking: ${block.thinking.substring(0, 200)}...);
    console.log(Signature: ${block.signature});
  } else if (block.type === 'text') {
    console.log(Final answer: ${block.text});
  }
}

Using Manual Mode (Legacy Models)

For models that still support manual mode:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[
        {
            "role": "user",
            "content": "Explain the P vs NP problem in detail."
        }
    ]
)

Choosing the Right Effort Level

The effort parameter in adaptive thinking gives you fine-grained control:

"low": Minimal reasoning overhead. Best for straightforward tasks where you want quick responses with basic verification.
"medium": Balanced reasoning. Good for most complex tasks like code review, data analysis, or multi-step logic.
"high": Maximum reasoning depth. Use for mathematical proofs, complex debugging, or tasks requiring thorough analysis.

Pro tip: Start with "medium" and escalate to "high" only when you need deeper reasoning. Higher effort consumes more tokens and increases latency.

Handling Thinking Blocks in Responses

When processing responses, you'll need to handle the thinking blocks appropriately:

def process_claude_response(response):
    thinking_content = []
    final_text = []
    
    for block in response.content:
        if block.type == "thinking":
            thinking_content.append({
                "thinking": block.thinking,
                "signature": block.signature
            })
        elif block.type == "text":
            final_text.append(block.text)
    
    return {
        "thinking_blocks": thinking_content,
        "final_answer": "".join(final_text)
    }

Best Practices

1. Set Appropriate `max_tokens`

Always set max_tokens higher than your thinking budget. A good rule of thumb:

max_tokens = thinking_budget + expected_output_tokens

For adaptive thinking, set max_tokens generously (e.g., 32000 for complex tasks).

2. Use Streaming for Long Responses

Extended thinking can produce lengthy reasoning. Enable streaming to get partial results faster:

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={"type": "adaptive", "effort": "high"},
    messages=[{"role": "user", "content": "Complex question..."}],
    stream=True
)
for event in stream:
    # Handle streaming events
    pass

3. Validate Signatures for Critical Applications

For applications requiring audit trails (e.g., financial analysis, legal reasoning), verify the thinking block signatures:

# Store signatures for later verification
signatures = [
    block.signature 
    for block in response.content 
    if block.type == "thinking"
]

4. Combine with Structured Outputs

Extended thinking pairs well with structured outputs for complex data extraction:

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={"type": "adaptive", "effort": "high"},
    messages=[{"role": "user", "content": "Analyze this financial report..."}],
    # Structured output configuration
)

Common Pitfalls to Avoid

Using manual mode on Opus 4.7+: This will return a 400 error. Always use adaptive thinking for new models.
Setting budget_tokens too low: Claude may cut off reasoning prematurely. For manual mode, use at least 50% of max_tokens.
Ignoring the signature: For production systems, always validate signatures to ensure thinking integrity.
Forgetting max_tokens: Extended thinking requires sufficient token headroom. Always set max_tokens higher than your thinking budget.

Real-World Use Cases

Extended thinking excels in scenarios requiring deep reasoning:

Mathematical proofs and theorem verification
Complex code debugging and optimization
Multi-step research synthesis
Legal document analysis
Scientific hypothesis generation
Strategic planning and decision trees

Key Takeaways

Adaptive thinking (type: "adaptive" with effort parameter) is the recommended approach for Claude Opus 4.7+ and newer models—manual mode is deprecated on these versions.
Choose effort levels wisely: Use "low" for simple tasks, "medium" for most complex work, and "high" only when maximum reasoning depth is required.
Always set max_tokens generously to give Claude enough room for both thinking and final output—a common source of errors is insufficient token allocation.
Handle thinking blocks explicitly in your code to extract reasoning content and signatures for auditing or transparency purposes.
Model compatibility matters: Check the compatibility matrix before implementing—older models may still use manual mode, while newer ones require adaptive thinking.

Mastering Claude's Extended Thinking: A Complete Guide to Adaptive Reasoning

Understanding Extended Thinking

Adaptive Thinking vs. Manual Extended Thinking

Adaptive Thinking (Recommended for Claude Opus 4.7+)

Manual Extended Thinking (Legacy)

Model Compatibility Matrix

How to Use Extended Thinking in the API

Basic Setup with Adaptive Thinking

Process the response

Using Manual Mode (Legacy Models)

Choosing the Right Effort Level

Handling Thinking Blocks in Responses

Best Practices

1. Set Appropriate max_tokens

2. Use Streaming for Long Responses

3. Validate Signatures for Critical Applications

4. Combine with Structured Outputs

Common Pitfalls to Avoid

Real-World Use Cases

Key Takeaways

1. Set Appropriate `max_tokens`