BeClaude
GuideBeginnerBest Practices2026-05-20

Mastering Extended Thinking in Claude: A Practical Guide to Adaptive & Manual Reasoning

Learn how to enable and optimize Claude's extended thinking for complex reasoning tasks. Covers adaptive thinking, manual mode, budget tokens, and best practices.

Quick Answer

This guide explains how to use extended thinking in Claude API to enhance reasoning for complex tasks. You'll learn about adaptive thinking (recommended for Opus 4.7), manual mode, budget tokens, and how to handle thinking content blocks in responses.

extended thinkingadaptive thinkingClaude APIreasoningbudget tokens

Introduction

Extended thinking is one of Claude's most powerful features for tackling complex, multi-step problems. When enabled, Claude generates internal reasoning steps before producing its final answer, giving you both transparency into its thought process and significantly improved accuracy on tasks like math, code analysis, logic puzzles, and research synthesis.

This guide covers everything you need to know to implement extended thinking effectively: from the basics of enabling it in the API to advanced configuration with adaptive thinking and budget tokens.

How Extended Thinking Works

When extended thinking is turned on, Claude creates special thinking content blocks in the API response. These blocks contain its step-by-step reasoning, followed by the final text content block with the answer.

Here's what a typical response looks like:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

The signature field is used for verification and is required when streaming or storing thinking blocks.

Enabling Extended Thinking

For Claude Opus 4.7 (Recommended: Adaptive Thinking)

Claude Opus 4.7 requires adaptive thinking — you cannot use the old manual budget_tokens approach. Adaptive thinking automatically allocates tokens based on task complexity.

Python example:
import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-opus-4-7", max_tokens=8192, thinking={"type": "adaptive", "effort": "high"}, messages=[ {"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) from 0 to pi"} ] )

Access thinking blocks

for block in response.content: if block.type == "thinking": print("Thinking:", block.thinking) elif block.type == "text": print("Answer:", block.text)
TypeScript example:
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 8192, thinking: { type: 'adaptive', effort: 'high' }, messages: [ { role: 'user', content: 'Solve this complex math problem: integrate x^2 * sin(x) from 0 to pi' } ] });

for (const block of response.content) { if (block.type === 'thinking') { console.log('Thinking:', block.thinking); } else if (block.type === 'text') { console.log('Answer:', block.text); } }

For Claude Opus 4.6 and Claude Sonnet 4.6

Adaptive thinking is recommended for these models too, but manual mode is still functional (though deprecated).

Adaptive mode (recommended):
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=8192,
    thinking={"type": "adaptive", "effort": "medium"},
    messages=[
        {"role": "user", "content": "Explain the implications of quantum entanglement on cryptography"}
    ]
)
Manual mode (deprecated but functional):
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=8192,
    thinking={"type": "enabled", "budget_tokens": 4096},
    messages=[
        {"role": "user", "content": "Explain the implications of quantum entanglement on cryptography"}
    ]
)

Understanding the effort Parameter

With adaptive thinking, you control reasoning depth using the effort parameter. This replaces the old budget_tokens approach.

Effort LevelUse Case
"low"Simple tasks, quick responses
"medium"Balanced reasoning for most tasks
"high"Complex problems requiring deep analysis
Claude automatically determines how many tokens to spend based on the effort level and the complexity of the query.

Budget Tokens (Manual Mode - Legacy)

If you're using a model that still supports manual mode (Opus 4.6, Sonnet 4.6, Mythos Preview), the budget_tokens parameter sets the maximum tokens Claude can use for reasoning.

Important rules:
  • budget_tokens must be less than max_tokens
  • A minimum of 1024 tokens is required
  • The budget must be at least 1 token less than max_tokens
Example:
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=8192,
    thinking={"type": "enabled", "budget_tokens": 4096},
    messages=[
        {"role": "user", "content": "Write a detailed analysis of the Fermi Paradox"}
    ]
)

Streaming with Extended Thinking

When streaming, thinking blocks appear before text blocks. You must handle them separately:

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8192,
    thinking={"type": "adaptive", "effort": "high"},
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}],
    stream=True
)

for event in stream: if event.type == "content_block_delta" and event.delta.type == "thinking_delta": print("Thinking:", event.delta.thinking, end="") elif event.type == "content_block_delta" and event.delta.type == "text_delta": print("Answer:", event.delta.text, end="")

Best Practices

1. Use Adaptive Thinking for New Models

Always prefer thinking: {type: "adaptive"} for Claude Opus 4.7 and newer models. It's simpler, more efficient, and automatically adjusts reasoning depth.

2. Set Appropriate max_tokens

Extended thinking consumes tokens from your max_tokens budget. If you set max_tokens too low, Claude may run out of space for the final answer. A good rule of thumb: set max_tokens to at least 2x your expected thinking budget.

3. Handle Thinking Blocks in Your Application

If you're displaying responses to users, you may want to:

  • Show thinking blocks in a collapsible section
  • Stream them in real-time for transparency
  • Omit them entirely if you only need the final answer

4. Combine with Structured Outputs

Extended thinking works well with structured outputs. The thinking blocks contain reasoning, while the text block can contain JSON or other structured data.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8192,
    thinking={"type": "adaptive", "effort": "high"},
    messages=[
        {"role": "user", "content": "Extract key entities from this text and return as JSON"}
    ]
)

5. Monitor Token Usage

Extended thinking can significantly increase token consumption. Monitor your usage and adjust the effort parameter or budget_tokens accordingly.

Model-Specific Notes

ModelRecommended ModeNotes
Claude Opus 4.7Adaptive onlyManual mode returns 400 error
Claude Opus 4.6Adaptive (manual deprecated)Manual still works but will be removed
Claude Sonnet 4.6Adaptive (manual deprecated)Interleaved mode deprecated
Claude Mythos PreviewAdaptive (default)Cannot disable thinking; display defaults to "omitted"

Troubleshooting

Error: "budget_tokens not supported on this model"

You're using Claude Opus 4.7 with manual mode. Switch to adaptive thinking:

# Wrong (returns 400 error on Opus 4.7)
thinking={"type": "enabled", "budget_tokens": 4096}

Correct

thinking={"type": "adaptive", "effort": "high"}
Error: "budget_tokens must be less than max_tokens"

Reduce your budget_tokens or increase max_tokens. The budget must be at least 1 token less than max_tokens.

No thinking blocks in response

Ensure you're using a model that supports extended thinking and that you've correctly set the thinking parameter in your request.

Key Takeaways

  • Extended thinking enhances Claude's reasoning by providing step-by-step internal reasoning before the final answer, improving accuracy on complex tasks.
  • Use adaptive thinking for Claude Opus 4.7 with the effort parameter (low, medium, high) — manual budget_tokens mode is no longer supported on this model.
  • Handle thinking blocks separately in your application — they appear as type: "thinking" content blocks before the final type: "text" answer.
  • Monitor token consumption carefully, as extended thinking can significantly increase usage — adjust effort or budget_tokens accordingly.
  • Streaming works with extended thinking — handle thinking_delta and text_delta events separately for real-time display.