BeClaude
GuideBeginnerAPI2026-05-16

Mastering Extended Thinking in Claude: A Guide to Adaptive Thinking and Effort Control

Learn how to enable and optimize Claude's extended thinking for complex reasoning tasks. Covers adaptive thinking, effort parameter, budget tokens, and practical API examples.

Quick Answer

You'll learn how to configure Claude's extended thinking mode using the API, including the new adaptive thinking with effort control, manual budget tokens, and how to handle thinking content blocks in responses.

extended thinkingadaptive thinkingClaude APIreasoningeffort parameter

Introduction

Claude’s extended thinking feature unlocks deeper reasoning for complex tasks—think mathematical proofs, multi-step logic puzzles, or intricate code analysis. Instead of just giving you an answer, Claude shows its step-by-step thought process before delivering the final response. This transparency helps you verify reasoning, debug prompts, and build trust in the output.

With the latest Claude models (Opus 4.7, Sonnet 4.6, and Opus 4.6), Anthropic has introduced adaptive thinking—a smarter way to allocate thinking tokens. This guide covers everything you need to know: from enabling extended thinking in the API to choosing between adaptive and manual modes, and handling the response format.

How Extended Thinking Works

When extended thinking is enabled, Claude generates special thinking content blocks alongside the usual text blocks. The thinking blocks contain internal reasoning, and the final text block contains the answer. This two-part structure gives you insight into how Claude arrived at its conclusion.

Here’s what a typical response looks like:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

The signature field is used for verification and is required when streaming.

Adaptive Thinking vs. Manual Extended Thinking

Anthropic now recommends adaptive thinking for all current models. Here’s the breakdown:

ModelRecommended ModeManual Mode Status
Claude Opus 4.7adaptive with effortNot supported (returns 400 error)
Claude Opus 4.6adaptiveDeprecated but functional
Claude Sonnet 4.6adaptiveDeprecated but functional (interleaved)
Claude Mythos Previewadaptive (default)Accepted, but disabled not supported

Adaptive Thinking (Recommended)

Adaptive thinking lets Claude decide how many tokens to spend on reasoning based on the task complexity. You control the effort level, which sets the maximum thinking budget relative to the model’s capability.

API Example (Python):
import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-opus-4-7", max_tokens=32000, thinking={ "type": "adaptive", "effort": "high" # Options: "low", "medium", "high" }, messages=[ { "role": "user", "content": "Prove that there are infinitely many prime numbers congruent to 3 mod 4." } ] )

for block in response.content: if block.type == "thinking": print(f"Thinking: {block.thinking}") elif block.type == "text": print(f"Answer: {block.text}")

API Example (TypeScript):
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 32000, thinking: { type: 'adaptive', effort: 'high' }, messages: [ { role: 'user', content: 'Prove that there are infinitely many prime numbers congruent to 3 mod 4.' } ] });

for (const block of response.content) { if (block.type === 'thinking') { console.log(Thinking: ${block.thinking}); } else if (block.type === 'text') { console.log(Answer: ${block.text}); } }

Manual Extended Thinking (Legacy)

Manual mode lets you set a fixed budget_tokens—the maximum number of tokens Claude can use for thinking. This is still supported on Opus 4.6 and Sonnet 4.6 but is deprecated.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Must be less than max_tokens
    },
    messages=[
        {
            "role": "user",
            "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
        }
    ]
)
Important: budget_tokens must be less than max_tokens. The difference (max_tokens - budget_tokens) is reserved for the final text response.

Effort Parameter (Adaptive Mode)

The effort parameter in adaptive thinking controls how much reasoning Claude applies. It accepts three values:

  • low: Minimal thinking, faster responses, suitable for simple tasks.
  • medium: Balanced reasoning, good for most use cases.
  • high: Maximum reasoning depth, best for complex problems (e.g., mathematical proofs, multi-step analysis).
Choosing the right effort level can save tokens and reduce latency while maintaining quality.

Task Budgets (Beta)

For advanced users, the task budget feature lets you set a total token budget for the entire thinking + response. This is useful when you want to cap costs while still using adaptive thinking.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=64000,
    thinking={
        "type": "adaptive",
        "effort": "high",
        "task_budget_tokens": 50000  # Total budget for thinking + text
    },
    messages=[...]
)

Fast Mode (Beta Research Preview)

Fast mode reduces thinking time for quicker responses, at the cost of some reasoning depth. It’s ideal for interactive applications where latency matters.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={
        "type": "adaptive",
        "effort": "high",
        "fast_mode": True
    },
    messages=[...]
)

Handling Thinking Blocks in Streaming

When streaming, thinking blocks appear as separate events. You need to handle them appropriately:

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={"type": "adaptive", "effort": "high"},
    messages=[{"role": "user", "content": "Explain quantum entanglement."}],
    stream=True
)

for event in stream: if event.type == "content_block_start" and event.content_block.type == "thinking": print("\n[Thinking started]") elif event.type == "content_block_delta" and event.delta.type == "thinking_delta": print(event.delta.thinking, end="") elif event.type == "content_block_stop" and event.content_block.type == "thinking": print("\n[Thinking ended]") elif event.type == "content_block_start" and event.content_block.type == "text": print("\n[Response]") elif event.type == "content_block_delta" and event.delta.type == "text_delta": print(event.delta.text, end="")

Best Practices

  • Start with adaptive thinking – It’s the recommended mode for all current models and simplifies token management.
  • Use effort: "high" for complex reasoning – For tasks like code review, math proofs, or multi-step analysis, high effort yields better results.
  • Set max_tokens generously – Thinking can consume many tokens. A good rule of thumb: set max_tokens to at least 2x your expected thinking budget.
  • Monitor token usage – Extended thinking increases token consumption. Use the usage field in the response to track costs.
  • Handle thinking blocks in streaming – If you stream responses, ensure your client correctly processes thinking_delta events.
  • Avoid disabling thinking on Mythos – The Mythos Preview model requires thinking; you cannot set thinking: {type: "disabled"}.

Key Takeaways

  • Adaptive thinking is the new standard for Claude Opus 4.7, Sonnet 4.6, and Opus 4.6—use thinking: {type: "adaptive", effort: "..."} instead of manual budget tokens.
  • Effort parameter (low, medium, high) controls reasoning depth; choose based on task complexity to balance quality and cost.
  • Manual extended thinking (type: "enabled" with budget_tokens) is deprecated on Opus 4.6 and Sonnet 4.6, and not supported on Opus 4.7.
  • Task budgets and fast mode are beta features that give you finer control over token usage and latency.
  • Streaming requires special handling for thinking blocks—use thinking_delta events to display reasoning in real time.