BeClaude
GuideBeginnerAgents2026-05-22

Adaptive Thinking in Claude API: A Complete Guide to Dynamic Extended Thinking

Learn how to use adaptive thinking with Claude Opus 4.7, Opus 4.6, Sonnet 4.6, and Mythos Preview. Includes code examples, effort parameter usage, and migration tips.

Quick Answer

This guide explains how to use adaptive thinking in the Claude API — a dynamic mode where Claude decides when and how much to think based on request complexity. You'll learn how to enable it, use the effort parameter, and migrate from fixed budget_tokens.

adaptive thinkingextended thinkingClaude APIeffort parameteragentic workflows

What Is Adaptive Thinking?

Adaptive thinking is the recommended way to use extended thinking with Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Mythos Preview. Instead of manually setting a fixed budget_tokens value, adaptive thinking lets Claude dynamically determine when and how much to use extended thinking based on the complexity of each request.

This feature is especially powerful for bimodal tasks (mixing simple and complex queries) and long-horizon agentic workflows (where Claude needs to reason across multiple tool calls).

Note: Adaptive thinking is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.

Supported Models

ModelAdaptive Thinking SupportNotes
Claude Mythos Preview (claude-mythos-preview)Default modeAuto-applies when thinking is unset. thinking: {type: "disabled"} is not supported.
Claude Opus 4.7 (claude-opus-4-7)Only supported modeManual thinking: {type: "enabled"} is rejected with a 400 error.
Claude Opus 4.6 (claude-opus-4-6)Supportedbudget_tokens is deprecated but still functional.
Claude Sonnet 4.6 (claude-sonnet-4-6)Supportedbudget_tokens is deprecated but still functional.
Important: Older models (Sonnet 4.5, Opus 4.5, etc.) do not support adaptive thinking. They require thinking: {type: "enabled"} with budget_tokens.

How Adaptive Thinking Works

In adaptive mode, thinking is optional for the model. Claude evaluates the complexity of each request and decides:

  • Whether to use extended thinking at all
  • How much thinking to allocate
At the default effort level (high), Claude almost always thinks. At lower effort levels, Claude may skip thinking for simpler problems, saving tokens and reducing latency.

Adaptive thinking also automatically enables interleaved thinking — meaning Claude can think between tool calls. This makes it especially effective for agentic workflows where the model needs to reason after receiving tool results.

How to Use Adaptive Thinking

Basic Usage

Set thinking.type to "adaptive" in your API request:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-opus-4-7", max_tokens=16000, thinking={"type": "adaptive"}, messages=[ { "role": "user", "content": "Explain why the sum of two even numbers is always even." } ] )

for block in response.content: if block.type == "thinking": print(f"\nThinking: {block.thinking}") elif block.type == "text": print(f"\nResponse: {block.text}")

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 16000, thinking: { type: 'adaptive' }, messages: [ { role: 'user', content: 'Explain why the sum of two even numbers is always even.' } ] });

for (const block of response.content) { if (block.type === 'thinking') { console.log(\nThinking: ${block.thinking}); } else if (block.type === 'text') { console.log(\nResponse: ${block.text}); } }

Using the Effort Parameter

You can combine adaptive thinking with the effort parameter to guide how much thinking Claude does. The effort level acts as soft guidance — it influences but doesn't strictly control the thinking budget.

Available Effort Levels

Effort LevelBehavior
lowMinimal thinking; Claude may skip thinking for simple requests. Best for high-throughput, low-latency applications.
mediumBalanced approach; Claude thinks when beneficial but conserves tokens for straightforward tasks.
high (default)Claude almost always thinks; suitable for complex reasoning, math, coding, and agentic workflows.

Example with Effort

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-opus-4-7", max_tokens=16000, thinking={ "type": "adaptive", "effort": "medium" # or "low", "high" }, messages=[ { "role": "user", "content": "Write a Python function to merge two sorted lists." } ] )

for block in response.content: if block.type == "thinking": print(f"\nThinking: {block.thinking}") elif block.type == "text": print(f"\nResponse: {block.text}")

Task Budgets (Beta)

For more granular control, you can use task budgets to set token limits for specific thinking phases. This is useful when you want to cap thinking costs while still benefiting from adaptive allocation.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={
        "type": "adaptive",
        "task_budgets": {
            "max_thinking_tokens": 8000,
            "max_response_tokens": 8000
        }
    },
    messages=[
        {"role": "user", "content": "Solve this complex math problem step by step."}
    ]
)
Note: Task budgets are in beta and may change in future API versions.

Fast Mode (Research Preview)

For applications where speed is critical, you can enable fast mode alongside adaptive thinking. Fast mode reduces thinking time at the cost of some reasoning quality.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={
        "type": "adaptive",
        "fast_mode": True  # research preview
    },
    messages=[
        {"role": "user", "content": "Summarize this article in 3 bullet points."}
    ]
)

Migrating from Fixed budget_tokens

If you're currently using thinking: {type: "enabled", budget_tokens: N} on Opus 4.6 or Sonnet 4.6, here's how to migrate:

Before (deprecated)

thinking={
    "type": "enabled",
    "budget_tokens": 16000
}

After (recommended)

thinking={
    "type": "adaptive",
    "effort": "high"  # approximates a generous budget
}

For workloads where you previously used a low budget_tokens (e.g., 2000), try effort: "low" or effort: "medium".

Best Practices

1. Start with High Effort

For most applications, start with effort: "high" and monitor performance. If you find Claude is over-thinking simple requests, gradually reduce the effort level.

2. Use Adaptive Thinking for Agentic Workflows

Adaptive thinking's interleaved thinking capability makes it ideal for multi-step agentic tasks. Claude can think after each tool call, improving decision quality.

3. Monitor Token Usage

Even though adaptive thinking is dynamic, you should still monitor token consumption. Use max_tokens to set an upper bound on total response length.

4. Combine with Structured Outputs

For applications requiring structured responses (e.g., JSON), combine adaptive thinking with structured outputs:

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    messages=[
        {
            "role": "user",
            "content": "Extract the name, date, and amount from this invoice."
        }
    ],
    response_format={"type": "json_object"}
)

5. Test with Both Modes

If you're migrating from fixed budget_tokens, run A/B tests comparing the old mode with adaptive thinking to ensure quality remains consistent or improves.

Common Pitfalls

  • Using adaptive thinking on unsupported models: Older models (Sonnet 4.5, Opus 4.5) will ignore the adaptive type and fall back to no thinking.
  • Setting budget_tokens with adaptive thinking: This combination is ignored; use effort instead.
  • Expecting fixed latency: Adaptive thinking means variable thinking time. If you need predictable latency, consider using fixed budget_tokens on Opus 4.6 or Sonnet 4.6 (though deprecated).

Key Takeaways

  • Adaptive thinking is the recommended mode for Claude Opus 4.7, Opus 4.6, Sonnet 4.6, and Mythos Preview. It dynamically allocates thinking based on request complexity.
  • Use the effort parameter (low, medium, high) to guide thinking depth without setting a fixed token budget.
  • Interleaved thinking is automatic in adaptive mode, making it ideal for agentic workflows with multiple tool calls.
  • Migrate from budget_tokens by replacing thinking: {type: "enabled", budget_tokens: N} with thinking: {type: "adaptive", effort: "high"}.
  • Start with effort: "high" and adjust downward based on your latency and cost requirements.