GuideBeginnerAPI2026-05-20

Mastering Extended Thinking in Claude: A Practical Guide to Adaptive & Manual Reasoning

Learn how to enable and optimize Claude's extended thinking for complex reasoning tasks. Covers adaptive thinking, effort budgets, manual mode, and code examples for the API.

Quick Answer

This guide explains how to use Claude's extended thinking feature to enhance reasoning for complex tasks. You'll learn the difference between adaptive and manual thinking, how to set effort budgets, and see practical API code examples for both Python and TypeScript.

extended thinkingadaptive thinkingClaude APIreasoningbudget tokens

Introduction

Claude's extended thinking feature unlocks deeper reasoning capabilities for complex tasks, allowing the model to "think step by step" before delivering a final answer. Whether you're building a research assistant, a code analysis tool, or a multi-step problem solver, extended thinking can dramatically improve the quality and accuracy of Claude's responses.

In this guide, we'll cover:

How extended thinking works under the hood
The difference between adaptive thinking (recommended) and manual extended thinking (deprecated on newer models)
How to configure effort budgets and token limits
Practical code examples in Python and TypeScript
Best practices for different use cases

How Extended Thinking Works

When extended thinking is enabled, Claude generates internal reasoning in the form of thinking content blocks before producing the final text content block. These thinking blocks contain the model's step-by-step analysis, which you can choose to display, summarize, or omit from the user-facing response.

A typical API response with extended thinking looks like this:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

The signature field is used for verification purposes and is required when streaming responses.

Adaptive Thinking vs. Manual Extended Thinking

Adaptive Thinking (Recommended)

Adaptive thinking (thinking: {type: "adaptive"}) is the modern, recommended approach. Instead of setting a fixed token budget, you specify an effort level that tells Claude how much reasoning to apply. The model dynamically allocates thinking tokens based on the complexity of the task. Supported models: Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, Claude Mythos Preview Key advantages:

No need to guess a token budget
Automatically adjusts reasoning depth per query
More efficient token usage
Required for Claude Opus 4.7 (manual mode returns a 400 error)

Manual Extended Thinking (Deprecated)

Manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) lets you set a fixed maximum number of tokens for reasoning. This approach is deprecated on Claude Opus 4.6 and Claude Sonnet 4.6, and not supported at all on Claude Opus 4.7. Still works on: Claude Opus 4.6, Claude Sonnet 4.6 (deprecated), Claude Mythos Preview

Configuring Effort and Budget Tokens

Adaptive Thinking with Effort Parameter

When using adaptive thinking, you set the effort parameter to control reasoning depth. The effort parameter accepts values like "low", "medium", or "high" (exact values may vary by model version).

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    thinking={
        "type": "adaptive",
        "effort": "high"  # Adjust based on task complexity
    },
    messages=[
        {"role": "user", "content": "Analyze the ethical implications of autonomous vehicles in urban environments."}
    ]
)
print(response.content[0].text)

Manual Budget Tokens (Legacy)

For models that still support manual mode, you set budget_tokens to the maximum number of tokens Claude can use for reasoning. The budget must be at least 1024 tokens.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048  # Max tokens for reasoning
    },
    messages=[
        {"role": "user", "content": "Solve this complex math problem: ..."}
    ]
)

Important: The budget_tokens value must be less than max_tokens. A good rule of thumb is to set budget_tokens to about 50-75% of max_tokens.

Controlling Thinking Visibility

Extended thinking gives you three options for how thinking content is returned:

Display Mode	Description	Use Case
`"omitted"` (default)	Thinking blocks are not returned in the response	Production apps where you don't want to expose reasoning
`"summarized"`	Returns a summary of the thinking process	When you want lightweight transparency
`"full"`	Returns complete thinking blocks	Debugging, education, or when full reasoning is valuable

Example with display control:

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    thinking={
        "type": "adaptive",
        "effort": "medium",
        "display": "summarized"  # Get a summary instead of full thinking
    },
    messages=[
        {"role": "user", "content": "Explain how quantum computing works."}
    ]
)

Streaming with Extended Thinking

When streaming responses, extended thinking content blocks are streamed as separate events. You need to handle thinking and text events differently.

import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=4096,
    thinking={
        "type": "adaptive",
        "effort": "high"
    },
    messages=[
        {"role": "user", "content": "Write a detailed analysis of the economic impact of AI."}
    ]
) as stream:
    for event in stream:
        if event.type == "thinking":
            # Handle thinking content
            print(f"Thinking: {event.thinking}")
        elif event.type == "text":
            # Handle final text
            print(f"Text: {event.text}")

Best Practices

1. Choose the Right Effort Level

Low effort: Simple tasks like summarization, translation, or straightforward Q&A
Medium effort: Multi-step reasoning, code generation, or analysis
High effort: Complex problem-solving, research, or tasks requiring deep reasoning

2. Set Appropriate Token Limits

For adaptive thinking, max_tokens should be large enough to accommodate both thinking and final response
For manual mode, ensure budget_tokens is at least 1024 and less than max_tokens
Consider the trade-off: more thinking tokens = better reasoning but higher latency and cost

3. Use Display Modes Wisely

In production, use "omitted" to keep responses clean
For debugging or educational tools, use "full" to inspect reasoning
For user-facing apps that need some transparency, use "summarized"

4. Handle Errors Gracefully

If you're using manual mode on Claude Opus 4.7, you'll get a 400 error. Always check model compatibility:

def get_thinking_config(model: str, effort: str = "medium"):
    if model == "claude-opus-4-7":
        return {
            "type": "adaptive",
            "effort": effort
        }
    else:
        # Fallback for older models
        return {
            "type": "enabled",
            "budget_tokens": 2048
        }

Real-World Use Cases

Research Assistant

Use high-effort adaptive thinking to analyze academic papers, synthesize information, and generate insights.

Code Review Tool

Enable extended thinking to catch subtle bugs, suggest optimizations, and explain complex code logic.

Decision Support System

For business intelligence, use medium-to-high effort to evaluate multiple factors and provide reasoned recommendations.

Key Takeaways

Adaptive thinking (thinking: {type: "adaptive"}) is the recommended approach for all modern Claude models, especially Claude Opus 4.7 where manual mode is not supported.
Manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) is deprecated on Claude Opus 4.6 and Claude Sonnet 4.6, and will be removed in future releases.
Use the effort parameter to control reasoning depth dynamically instead of guessing token budgets.
Control thinking visibility with the display parameter: "omitted", "summarized", or "full".
Always check model compatibility and handle errors gracefully when migrating from manual to adaptive thinking.