GuideBeginnerAPI2026-05-22

Mastering Extended Thinking in Claude: A Practical Guide to Adaptive and Manual Reasoning

Learn how to use Claude's extended thinking feature for complex reasoning tasks. Covers adaptive thinking, effort parameters, manual mode, and code examples for the API.

Quick Answer

This guide explains how to enable and configure Claude's extended thinking for step-by-step reasoning. You'll learn the difference between adaptive thinking (recommended for Opus 4.7) and manual mode, how to set effort levels, and how to handle thinking blocks in your API responses.

extended thinkingadaptive thinkingClaude APIreasoningeffort parameter

Introduction

Claude's extended thinking feature unlocks a new level of reasoning capability. When enabled, Claude generates internal "thinking" content blocks before producing its final answer. This step-by-step reasoning process allows the model to tackle complex problems—like mathematical proofs, multi-step logic, or intricate code analysis—with greater accuracy and transparency.

Whether you're building a research assistant, a tutoring app, or a debugging tool, understanding how to configure and consume extended thinking is essential. This guide covers everything from basic setup to advanced configuration, including the new adaptive thinking mode and the effort parameter.

How Extended Thinking Works

When extended thinking is enabled, Claude's API response includes one or more thinking content blocks followed by text content blocks. Each thinking block contains the model's internal reasoning and a cryptographic signature for verification.

Here's a simplified example of what the response looks like:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

The thinking blocks are not returned to the user by default—they are meant for developers to inspect or summarize. You can choose to display them, summarize them, or omit them entirely.

Adaptive Thinking vs. Manual Extended Thinking

Claude now supports two modes of extended thinking:

Adaptive Thinking (Recommended)

Adaptive thinking (thinking: {type: "adaptive"}) is the modern approach. Instead of setting a fixed token budget for thinking, you specify an effort level. Claude dynamically allocates thinking tokens based on the complexity of the task.

This mode is required for Claude Opus 4.7 and recommended for Claude Opus 4.6 and Claude Sonnet 4.6. Manual mode is deprecated on those models and will be removed in a future release.

Manual Extended Thinking (Legacy)

Manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) lets you set a fixed number of tokens for the thinking process. This is still supported on most current models except Claude Opus 4.7, where it returns a 400 error. Use adaptive thinking instead.

Supported Models

Model	Adaptive Thinking	Manual Thinking	Notes
Claude Opus 4.7	✅ Required	❌ Returns 400	Use `effort` parameter
Claude Opus 4.6	✅ Recommended	✅ Deprecated	Manual still works but will be removed
Claude Sonnet 4.6	✅ Recommended	✅ Deprecated	Interleaved mode deprecated
Claude Mythos Preview	✅ Default	✅ Accepted	`disabled` not supported; display defaults to `omitted`

How to Use Extended Thinking in the API

Basic Setup with Adaptive Thinking (Python)

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={
        "type": "adaptive",
        "effort": "high"  # Options: "low", "medium", "high"
    },
    messages=[
        {
            "role": "user",
            "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
        }
    ]
)
Process the response
for block in response.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking}")
    elif block.type == "text":
        print(f"Answer: {block.text}")

Basic Setup with Manual Thinking (Python)

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[
        {
            "role": "user",
            "content": "Explain the P vs NP problem in simple terms."
        }
    ]
)

Using Adaptive Thinking with TypeScript

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 16000,
  thinking: {
    type: 'adaptive',
    effort: 'high'
  },
  messages: [
    {
      role: 'user',
      content: 'Design a distributed caching system with eventual consistency.'
    }
  ]
});
for (const block of response.content) {
  if (block.type === 'thinking') {
    console.log('Thinking:', block.thinking);
  } else if (block.type === 'text') {
    console.log('Answer:', block.text);
  }
}

The Effort Parameter

The effort parameter controls how much thinking Claude invests in a task. It accepts three values:

low: Minimal thinking, faster responses, suitable for simple tasks.
medium: Balanced thinking, good for most use cases.
high: Maximum reasoning depth, best for complex problems like mathematical proofs, code generation, or multi-step analysis.

Higher effort levels consume more tokens and increase latency, but yield more thorough reasoning.

Task Budgets (Beta)

For advanced control, you can set a task budget alongside adaptive thinking. This limits the total tokens Claude can use for both thinking and the final response. It's useful when you need predictable costs or response sizes.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={
        "type": "adaptive",
        "effort": "high",
        "task_budget": 12000  # Total tokens for thinking + response
    },
    messages=[...]
)

Fast Mode (Beta Research Preview)

Fast mode is an experimental feature that reduces thinking time for simpler tasks while maintaining quality. It's ideal for latency-sensitive applications where you still want some reasoning.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={
        "type": "adaptive",
        "effort": "medium",
        "fast_mode": True
    },
    messages=[...]
)

Handling Thinking Blocks in Your Application

When you receive a response with thinking blocks, you have several options:

Display the thinking: Useful for debugging or educational apps where you want users to see the reasoning.
Summarize the thinking: Use Claude to generate a concise summary of the thinking block before showing it to the user.
Omit the thinking: Only show the final text block to the user.

For Claude Mythos Preview, the display behavior is configurable:

response = client.messages.create(
    model="claude-mythos-preview",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 8000,
        "display": "summarized"  # Options: "omitted" (default), "summarized"
    },
    messages=[...]
)

Best Practices

Use adaptive thinking for new projects: It's the future-proof choice and works best with Claude Opus 4.7.
Set effort based on task complexity: Use low for simple Q&A, high for complex reasoning.
Monitor token usage: Thinking tokens count toward your max_tokens limit. Set max_tokens high enough to accommodate both thinking and the final response.
Handle streaming gracefully: When streaming, thinking blocks arrive before text blocks. Ensure your UI can handle this ordering.
Test with different effort levels: Run your prompts with low, medium, and high effort to find the sweet spot between quality and speed.

Key Takeaways

Adaptive thinking with the effort parameter is the recommended way to use extended thinking on Claude Opus 4.7 and newer models.
Manual extended thinking (budget_tokens) is deprecated on Opus 4.6 and Sonnet 4.6, and not supported on Opus 4.7.
The effort parameter accepts low, medium, or high values, giving you fine-grained control over reasoning depth.
Thinking blocks appear before text blocks in the API response; you can display, summarize, or omit them.
Use task budgets and fast mode for advanced control over token usage and latency.