GuideBeginnerAPI2026-05-22

Mastering Extended Thinking in Claude: A Guide to Adaptive and Manual Reasoning Modes

Learn how to use Claude's extended thinking feature for complex reasoning tasks. Covers adaptive thinking, manual mode, effort parameters, and practical API examples.

Quick Answer

This guide explains how to enable and configure Claude's extended thinking feature, including adaptive thinking (recommended for Opus 4.7) and manual mode (deprecated but functional on older models). You'll learn how to set effort levels, handle thinking content blocks, and optimize for complex tasks like math, code analysis, and multi-step reasoning.

extended thinkingadaptive thinkingClaude APIreasoningtoken budget

Introduction

Claude's extended thinking feature unlocks a new level of reasoning capability, allowing the model to "think step by step" before delivering a final answer. This internal reasoning process produces thinking content blocks that give you varying degrees of transparency into Claude's logic. Whether you're building a complex code analysis tool, a math tutoring app, or a multi-step reasoning agent, extended thinking can dramatically improve the quality and accuracy of Claude's outputs.

This guide covers everything you need to know: from the basics of enabling extended thinking to the latest adaptive thinking mode, effort parameters, and best practices for different Claude models.

How Extended Thinking Works

When extended thinking is enabled, Claude generates one or more thinking content blocks before its final text response. These blocks contain the model's internal reasoning—its chain-of-thought, intermediate calculations, and logical deductions. The API response looks like this:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

The signature field is used for verification and is required when streaming. The thinking blocks are always followed by at least one text block containing the final answer.

Adaptive Thinking (Recommended for Claude Opus 4.7)

Starting with Claude Opus 4.7, Anthropic introduced adaptive thinking—a smarter, more flexible approach to extended thinking. Instead of setting a fixed token budget, you let Claude decide how much thinking is needed based on the complexity of the task. This is controlled via the effort parameter.

Effort Levels

The effort parameter accepts three values:

low: Minimal thinking. Best for simple tasks where you want fast responses.
medium: Balanced thinking. Good for most everyday complex tasks.
high: Maximum thinking. Ideal for the hardest problems—math proofs, multi-step reasoning, code generation with many constraints.

Using Adaptive Thinking

Here's how to enable adaptive thinking in the Messages API:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    thinking={
        "type": "adaptive",
        "effort": "high"
    },
    messages=[
        {
            "role": "user",
            "content": "Prove that the square root of 2 is irrational."
        }
    ]
)
print(response.content)

Important: On Claude Opus 4.7, using the old manual mode (thinking: {type: "enabled", budget_tokens: N}) will return a 400 error. You must use adaptive thinking.

Manual Extended Thinking (Legacy)

For models like Claude Opus 4.6 and Claude Sonnet 4.6, you can still use manual extended thinking with a fixed token budget. However, this mode is deprecated and will be removed in a future release. Anthropic strongly recommends migrating to adaptive thinking.

Setting a Token Budget

The budget_tokens parameter specifies the maximum number of tokens Claude can use for thinking. This is separate from max_tokens, which controls the total response length (thinking + final answer).

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=8192,
    thinking={
        "type": "enabled",
        "budget_tokens": 4096
    },
    messages=[
        {
            "role": "user",
            "content": "Write a Python function to solve a Sudoku puzzle using backtracking."
        }
    ]
)

Budget Tokens vs. Max Tokens

budget_tokens: The thinking budget. Claude will stop thinking once it reaches this limit.
max_tokens: The total response limit (thinking + final text). Must be greater than budget_tokens.

A good rule of thumb: set max_tokens to at least 1.5x your budget_tokens to leave room for the final answer.

Model-Specific Behavior

Different Claude models handle extended thinking differently. Here's a quick reference:

Model	Adaptive Thinking	Manual Mode	Notes
Claude Opus 4.7	✅ Required	❌ Returns 400 error	Use `effort` parameter
Claude Opus 4.6	✅ Recommended	✅ Deprecated but functional	Migrate to adaptive
Claude Sonnet 4.6	✅ Recommended	✅ Deprecated (interleaved mode)	Migrate to adaptive
Claude Mythos Preview	✅ Default	✅ Accepted	`thinking: disabled` not supported; use `display: "summarized"` for summaries

Claude Mythos Preview Special Behavior

Claude Mythos Preview has unique behavior:

Adaptive thinking is the default.
You cannot disable thinking (thinking: {type: "disabled"} is not supported).
By default, thinking content is omitted from the response (display = "omitted").
To receive thinking summaries, pass display: "summarized" in the thinking configuration.

response = client.messages.create(
    model="claude-mythos-preview",
    max_tokens=4096,
    thinking={
        "type": "adaptive",
        "display": "summarized"
    },
    messages=[
        {"role": "user", "content": "Explain quantum entanglement."}
    ]
)

Streaming with Extended Thinking

Extended thinking works with streaming. When streaming, you'll receive thinking content block deltas followed by text deltas. Here's an example using Python:

with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=4096,
    thinking={"type": "adaptive", "effort": "high"},
    messages=[
        {"role": "user", "content": "Solve this math problem: 1234 * 5678"}
    ]
) as stream:
    for event in stream:
        if event.type == "content_block_delta":
            if event.delta.type == "thinking_delta":
                print(f"Thinking: {event.delta.thinking}", end="")
            elif event.delta.type == "text_delta":
                print(f"Answer: {event.delta.text}", end="")

Best Practices

1. Choose the Right Effort Level

Low effort: Use for simple Q&A, straightforward code generation, or when speed matters.
Medium effort: Good default for most tasks—complex instructions, moderate reasoning.
High effort: Reserve for the hardest problems: mathematical proofs, multi-step planning, code with many constraints.

2. Combine with Structured Outputs

Extended thinking pairs well with structured outputs (JSON mode). Claude can reason internally and then output a perfectly formatted JSON response:

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    thinking={"type": "adaptive", "effort": "high"},
    messages=[
        {"role": "user", "content": "Extract all dates and amounts from this invoice text..."}
    ]
)

3. Monitor Token Usage

Extended thinking consumes tokens. The thinking content blocks count toward your token usage. Monitor your costs, especially with high effort or large budget_tokens.

4. Handle Thinking Content in Your Application

If you're displaying Claude's response to end users, you may want to:

Show thinking content as an expandable section (e.g., "Show reasoning").
Use the thinking content for debugging or quality assurance.
Omit thinking content entirely and only show the final text.

5. Migrate from Manual to Adaptive

If you're using manual mode on Claude Opus 4.6 or Sonnet 4.6, start migrating to adaptive thinking now. The transition is straightforward:

Before (manual):

thinking={"type": "enabled", "budget_tokens": 4096}

After (adaptive):

thinking={"type": "adaptive", "effort": "high"}

Common Pitfalls

Setting budget_tokens too low: If the budget is too small, Claude may cut off its reasoning prematurely, leading to lower quality answers.
Forgetting max_tokens: Always set max_tokens higher than budget_tokens.
Using manual mode on Opus 4.7: This will return a 400 error. Use adaptive thinking.
Ignoring the signature: When streaming, the signature is required for verification. Always include it if you need to verify the integrity of the thinking content.

Conclusion

Extended thinking is a powerful feature that elevates Claude from a simple text generator to a true reasoning engine. By understanding the differences between adaptive and manual modes, choosing the right effort level, and following best practices, you can build applications that tackle the most complex problems with clarity and accuracy.

Whether you're building a math tutor, a code reviewer, or a research assistant, extended thinking gives you the transparency and control you need to deliver high-quality results.

Key Takeaways

Adaptive thinking is the future: Use thinking: {type: "adaptive", effort: "high|medium|low"} on Claude Opus 4.7 and newer models. Manual mode is deprecated.
Effort levels matter: Choose low for speed, medium for balance, and high for the hardest reasoning tasks.
Model compatibility: Claude Opus 4.7 requires adaptive thinking; older models support both but manual mode is deprecated.
Token budget: Always set max_tokens higher than budget_tokens in manual mode. In adaptive mode, Claude manages the thinking budget automatically.
Streaming works: Extended thinking is fully compatible with streaming—handle thinking_delta and text_delta events separately.