BeClaude
GuideBeginnerBest Practices2026-05-22

Mastering Extended Thinking in Claude: Adaptive Thinking, Effort Control, and Best Practices

Learn how to use Claude's extended thinking feature for complex reasoning tasks. Covers adaptive thinking, effort parameter, manual mode, and code examples for API integration.

Quick Answer

You'll learn how to enable and configure Claude's extended thinking for complex reasoning, including adaptive thinking mode, the effort parameter, manual budget_tokens, and how to handle thinking blocks in API responses.

extended thinkingadaptive thinkingClaude APIreasoningeffort parameter

Introduction

Claude's extended thinking feature unlocks enhanced reasoning capabilities for complex tasks. When enabled, Claude produces an internal chain-of-thought before delivering its final answer, and the API can return this reasoning as transparent content blocks. This guide covers everything you need to know to implement extended thinking effectively—from basic setup to advanced configuration with adaptive thinking and effort control.

How Extended Thinking Works

When extended thinking is turned on, Claude generates thinking content blocks that contain its step-by-step reasoning. These blocks appear in the API response before the final text content blocks. The model uses its own reasoning to refine and validate its final answer, leading to more accurate and well-reasoned outputs.

Here's the default response structure:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

The signature field is used for verification and is required when streaming.

Supported Models and Modes

Extended thinking behavior varies by Claude model version. Here's a quick reference:

ModelRecommended ModeManual ModeNotes
Claude Opus 4.7adaptive with effort❌ Returns 400 errorUse adaptive thinking only
Claude Opus 4.6adaptiveenabled (deprecated)Manual mode still works but will be removed
Claude Sonnet 4.6adaptiveenabled (deprecated)Manual mode with interleaved still works
Claude Mythos Previewadaptive (default)enabled accepteddisabled not supported; display defaults to "omitted"
Important: For Claude Opus 4.7, manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) is no longer supported and returns a 400 error. You must use adaptive thinking with the effort parameter.

Adaptive Thinking (Recommended)

Adaptive thinking lets Claude dynamically decide how much reasoning to use based on the complexity of the task. This is the recommended approach for all current models.

Basic Adaptive Thinking

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-opus-4-7", max_tokens=16000, thinking={"type": "adaptive"}, messages=[ { "role": "user", "content": "Prove that the square root of 2 is irrational." } ] )

for block in response.content: if block.type == "thinking": print(f"Thinking: {block.thinking}") elif block.type == "text": print(f"Answer: {block.text}")

Using the Effort Parameter

The effort parameter gives you fine-grained control over how much reasoning Claude applies. It accepts values on a scale from 0.0 to 1.0, where higher values encourage deeper reasoning.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={
        "type": "adaptive",
        "effort": 0.8  # High effort for complex tasks
    },
    messages=[
        {
            "role": "user",
            "content": "Design a distributed caching system that handles cache invalidation across 100 nodes."
        }
    ]
)
When to adjust effort:
  • Low effort (0.1–0.3): Simple factual queries, quick lookups, straightforward translations
  • Medium effort (0.4–0.6): Standard reasoning tasks, code generation, analysis
  • High effort (0.7–1.0): Complex mathematical proofs, multi-step planning, deep research

Manual Extended Thinking (Legacy)

For models that still support it (Claude Opus 4.6, Sonnet 4.6, Mythos), you can manually set a token budget for thinking:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Claude will use up to 10k tokens for thinking
    },
    messages=[
        {
            "role": "user",
            "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
        }
    ]
)
Note: budget_tokens must be less than max_tokens. Claude will use at most budget_tokens for thinking, and the remaining tokens for the final response.

Streaming with Extended Thinking

When streaming, thinking blocks are delivered as separate events. Here's how to handle them:

import anthropic

client = anthropic.Anthropic()

with client.messages.stream( model="claude-opus-4-7", max_tokens=16000, thinking={"type": "adaptive", "effort": 0.7}, messages=[ { "role": "user", "content": "Explain quantum entanglement in simple terms." } ] ) as stream: for event in stream: if event.type == "content_block_delta" and event.delta.type == "thinking_delta": print(event.delta.thinking, end="", flush=True) elif event.type == "content_block_delta" and event.delta.type == "text_delta": print(event.delta.text, end="", flush=True)

Task Budgets (Beta)

For advanced use cases, you can set a task budget that limits the total tokens Claude can use across both thinking and final response. This is useful for controlling costs while still allowing deep reasoning.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={
        "type": "adaptive",
        "effort": 0.9,
        "task_budget_tokens": 20000  # Total tokens for thinking + response
    },
    messages=[...]
)

Fast Mode (Research Preview)

Fast mode is an experimental feature that reduces thinking time for simpler tasks while maintaining reasoning quality. Enable it by setting fast_mode: true in the thinking configuration:

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={
        "type": "adaptive",
        "effort": 0.5,
        "fast_mode": True  # Research preview
    },
    messages=[...]
)
Caution: Fast mode is a research preview. It may produce lower quality reasoning on complex tasks. Test thoroughly before using in production.

Best Practices

1. Choose the Right Effort Level

Start with effort: 0.5 and adjust based on task complexity. For simple Q&A, lower effort saves tokens. For multi-step reasoning, higher effort yields better results.

2. Set Appropriate max_tokens

Always set max_tokens high enough to accommodate both thinking and the final response. A good rule of thumb: max_tokens = budget_tokens 1.5 (or max_tokens = effort 20000 for adaptive mode).

3. Handle Thinking Blocks in Your Application

When displaying responses to users, you may want to:

  • Show thinking blocks in a collapsible UI element
  • Use them for debugging or transparency
  • Omit them for a clean user experience

4. Use Adaptive Thinking for New Projects

Manual extended thinking is deprecated on most models. Always prefer thinking: {"type": "adaptive"} for future-proof code.

5. Combine with Structured Outputs

Extended thinking works well with structured outputs (JSON mode). Claude can reason through complex data transformations before outputting structured results.

Common Pitfalls

  • Exceeding budget_tokens: If budget_tokens >= max_tokens, the API returns an error. Always leave headroom for the final response.
  • Using manual mode on Opus 4.7: This returns a 400 error. Switch to adaptive thinking.
  • Ignoring signatures: When streaming, always validate signatures if you need to verify the integrity of thinking blocks.

Key Takeaways

  • Adaptive thinking (thinking: {"type": "adaptive"}) is the recommended approach for all current Claude models, especially Opus 4.7 where manual mode is no longer supported.
  • The effort parameter (0.0–1.0) lets you control reasoning depth—use higher values for complex tasks and lower values for simple queries.
  • Manual extended thinking with budget_tokens is deprecated on Opus 4.6 and Sonnet 4.6, and removed on Opus 4.7. Migrate to adaptive thinking.
  • Streaming works seamlessly with extended thinking—handle thinking_delta and text_delta events separately.
  • Task budgets and fast mode are experimental features that give you additional control over token usage and response speed.