Guide2026-04-22

Adaptive Thinking in Claude API: Smarter Extended Thinking Without Manual Budgets

Learn how to use adaptive thinking with Claude Opus 4.7, Opus 4.6, and Sonnet 4.6. Dynamic extended thinking, effort control, and code examples included.

Quick Answer

Adaptive thinking lets Claude dynamically decide when and how much to use extended thinking, replacing manual budget_tokens. It's the only mode on Opus 4.7 and default on Mythos Preview. Use thinking.type: 'adaptive' with an optional effort parameter.

adaptive thinkingextended thinkingClaude APIClaude Opus 4.7agentic workflows

Introduction

Extended thinking has been one of Claude's most powerful capabilities, allowing the model to reason step-by-step before generating a final answer. However, manually setting a budget_tokens value often led to either wasted tokens (overthinking simple tasks) or insufficient reasoning (underthinking complex ones).

Adaptive thinking solves this problem. Instead of you guessing how much thinking a task requires, Claude evaluates each request's complexity and dynamically decides whether—and how much—to think. This results in better performance, lower costs, and simpler code.

This guide covers everything you need to know: supported models, how adaptive thinking works, the effort parameter, code examples, and migration tips.

Supported Models

Adaptive thinking is available on:

Claude Mythos Preview (claude-mythos-preview) — adaptive thinking is the default; thinking: {type: "disabled"} is not supported.
Claude Opus 4.7 (claude-opus-4-7) — adaptive thinking is the only supported thinking mode. Manual thinking: {type: "enabled", budget_tokens: N} is rejected with a 400 error.
Claude Opus 4.6 (claude-opus-4-6) — supports adaptive thinking; manual budget_tokens is deprecated.
Claude Sonnet 4.6 (claude-sonnet-4-6) — supports adaptive thinking; manual budget_tokens is deprecated.

Important: Older models (Sonnet 4.5, Opus 4.5, etc.) do not support adaptive thinking and still require thinking: {type: "enabled", budget_tokens: N}.

How Adaptive Thinking Works

In adaptive mode, thinking is optional for Claude. The model evaluates each request and decides:

Whether to use extended thinking at all
How much thinking to apply

At the default effort level (high), Claude almost always thinks. At lower effort levels, it may skip thinking for simple problems—saving tokens and reducing latency.

Adaptive thinking also automatically enables interleaved thinking: Claude can think between tool calls, making it especially effective for agentic workflows where the model needs to reason after receiving tool results.

Using Adaptive Thinking in the API

Basic Usage

Set thinking.type to "adaptive" in your API request:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    messages=[
        {
            "role": "user",
            "content": "Explain why the sum of two even numbers is always even."
        }
    ]
)
for block in response.content:
    if block.type == "thinking":
        print(f"\nThinking: {block.thinking}")
    elif block.type == "text":
        print(f"\nResponse: {block.text}")

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 16000,
  thinking: { type: 'adaptive' },
  messages: [
    {
      role: 'user',
      content: 'Explain why the sum of two even numbers is always even.'
    }
  ]
});
for (const block of response.content) {
  if (block.type === 'thinking') {
    console.log(\nThinking: ${block.thinking});
  } else if (block.type === 'text') {
    console.log(\nResponse: ${block.text});
  }
}

Adaptive Thinking with the Effort Parameter

You can guide how much thinking Claude does using the effort parameter. The effort level accepts values from 0.0 to 1.0 (inclusive), with 0.5 as the default:

Low effort (e.g., 0.1): Minimal thinking; Claude may skip thinking for simple tasks. Best for high-volume, low-complexity requests.
Medium effort (e.g., 0.5): Balanced thinking. The default.
High effort (e.g., 0.9 or 1.0): Maximum thinking. Best for complex reasoning, math, coding, and agentic workflows.

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={
        "type": "adaptive",
        "effort": 0.9  # High effort for complex reasoning
    },
    messages=[
        {
            "role": "user",
            "content": "Design a distributed caching system that handles cache invalidation across 100 nodes."
        }
    ]
)
for block in response.content:
    if block.type == "thinking":
        print(f"\nThinking: {block.thinking}")
    elif block.type == "text":
        print(f"\nResponse: {block.text}")

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 32000,
  thinking: {
    type: 'adaptive',
    effort: 0.9
  },
  messages: [
    {
      role: 'user',
      content: 'Design a distributed caching system that handles cache invalidation across 100 nodes.'
    }
  ]
});
for (const block of response.content) {
  if (block.type === 'thinking') {
    console.log(\nThinking: ${block.thinking});
  } else if (block.type === 'text') {
    console.log(\nResponse: ${block.text});
  }
}

Adaptive Thinking for Agentic Workflows

One of the biggest advantages of adaptive thinking is interleaved thinking—Claude can think between tool calls. This is a game-changer for agentic workflows where the model needs to:

Reason about the user's request
Call a tool
Think about the tool's output
Call another tool or produce a final answer

With manual budget_tokens, thinking was a single block at the start. With adaptive thinking, Claude can distribute thinking across the entire conversation.

Example: Multi-step Tool Use

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={"type": "adaptive", "effort": 0.8},
    tools=[
        {
            "name": "search_database",
            "description": "Search a database for records",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            }
        },
        {
            "name": "calculate",
            "description": "Perform a calculation",
            "input_schema": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string"}
                },
                "required": ["expression"]
            }
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "Find all orders from last month and calculate the total revenue."
        }
    ]
)
Claude will think, call search_database, think about results, call calculate, think, then respond
for block in response.content:
    if block.type == "thinking":
        print(f"\nThinking: {block.thinking}")
    elif block.type == "text":
        print(f"\nResponse: {block.text}")
    elif block.type == "tool_use":
        print(f"\nTool call: {block.name}({block.input})")

Migration Guide: From Manual to Adaptive Thinking

If you're currently using thinking: {type: "enabled", budget_tokens: N}, here's how to migrate:

Step 1: Update Your Thinking Configuration

Before (deprecated):

thinking={
    "type": "enabled",
    "budget_tokens": 16000
}

After (recommended):

thinking={
    "type": "adaptive",
    "effort": 0.7  # Adjust based on your workload
}

Step 2: Remove Beta Headers

Adaptive thinking does not require any beta header. If you were using anthropic-beta: thinking-2025-01-02, you can remove it.

Step 3: Test with Your Workloads

For simple tasks (Q&A, summarization), try low effort (0.3–0.5)
For complex reasoning (math, code generation), try high effort (0.8–1.0)
For agentic workflows, use high effort (0.8–1.0) to maximize interleaved thinking

Step 4: Monitor and Adjust

Track token usage and response quality. Adaptive thinking often reduces token consumption for bimodal workloads (mixing simple and complex requests).

When to Use Adaptive Thinking vs. Manual Budget

Use Case	Recommended Mode
Bimodal workloads (mix of simple & complex)	Adaptive thinking
Long-horizon agentic workflows	Adaptive thinking
Predictable latency requirements	Manual budget (Opus 4.6/Sonnet 4.6 only)
Precise cost control	Manual budget (Opus 4.6/Sonnet 4.6 only)
Opus 4.7 or Mythos Preview	Adaptive thinking (only option)

Warning: Manual budget_tokens is deprecated on Opus 4.6 and Sonnet 4.6 and will be removed in a future model release. Plan to migrate to adaptive thinking.

Best Practices

Start with default effort (0.5) and adjust based on your workload's complexity.
Use high effort (0.8–1.0) for agentic workflows to leverage interleaved thinking.
Use lower effort (0.2–0.4) for high-volume, simple tasks to save tokens and reduce latency.
Set max_tokens generously (e.g., 16000–32000) to allow Claude room to think and respond.
Handle thinking blocks in your response processing to display or log Claude's reasoning.

Key Takeaways

Adaptive thinking replaces manual budget_tokens — Claude dynamically decides when and how much to think, optimizing for both performance and cost.
It's the only mode on Opus 4.7 and the default on Mythos Preview. Manual thinking is rejected on Opus 4.7.
Use the effort parameter (0.0–1.0) to guide thinking depth. Default is 0.5.
Interleaved thinking enables Claude to reason between tool calls, making it ideal for agentic workflows.
Migrate now if you're using manual budget_tokens on Opus 4.6 or Sonnet 4.6 — it's deprecated and will be removed in a future release.