GuideBeginnerPricing2026-05-20

Mastering Adaptive Thinking in Claude: A Practical Guide to Smarter, Dynamic Reasoning

Learn how to use Claude's adaptive thinking mode to dynamically allocate reasoning tokens, improve agentic workflows, and optimize API costs with the effort parameter.

Quick Answer

This guide explains how to use Claude's adaptive thinking mode to let the model dynamically decide when and how much to think, replacing manual budget_tokens. You'll learn how to implement it in the API, control thinking depth with the effort parameter, and optimize performance for agentic tasks.

adaptive thinkingextended thinkingClaude APIagentic workflowseffort parameter

Introduction

Claude's extended thinking capability has been a game-changer for complex reasoning tasks, but manually setting a thinking token budget (budget_tokens) often leads to either wasted tokens on simple queries or insufficient reasoning on hard problems. Enter adaptive thinking — a new mode that lets Claude dynamically determine when and how much to think, based on the complexity of each request.

This guide will walk you through everything you need to know about adaptive thinking: what it is, how it works, which models support it, and how to use it effectively in your API calls. By the end, you'll be able to replace rigid thinking budgets with a smarter, more efficient approach.

What Is Adaptive Thinking?

Adaptive thinking is the recommended way to use extended thinking with Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Instead of you manually setting a budget_tokens value, Claude evaluates each incoming request and decides:

Whether to use extended thinking at all
How much thinking to apply

This makes adaptive thinking ideal for bimodal workloads (where some queries are trivial and others are complex) and long-horizon agentic workflows (where the model needs to reason between multiple tool calls).

Note: On Claude Opus 4.7, adaptive thinking is the only supported thinking mode. The old thinking: {type: "enabled", budget_tokens: N} syntax is rejected with a 400 error.

Supported Models

Model	Adaptive Thinking Support	Notes
Claude Mythos Preview (`claude-mythos-preview`)	Default (auto-applies)	`thinking: {type: "disabled"}` not supported
Claude Opus 4.7 (`claude-opus-4-7`)	Only supported mode	Manual `enabled` rejected
Claude Opus 4.6 (`claude-opus-4-6`)	Supported	Manual `enabled` deprecated
Claude Sonnet 4.6 (`claude-sonnet-4-6`)	Supported	Manual `enabled` deprecated
Older models (Sonnet 4.5, Opus 4.5, etc.)	Not supported	Must use `thinking: {type: "enabled"}`

If you're using Opus 4.6 or Sonnet 4.6, you should migrate from budget_tokens to adaptive thinking as soon as possible — the old syntax will be removed in a future model release.

How Adaptive Thinking Works

In adaptive mode, thinking becomes optional for the model. Here's what happens under the hood:

Request arrives — Claude receives your prompt
Complexity assessment — The model evaluates how much reasoning is required
Dynamic allocation — Claude decides whether to think and, if so, how many tokens to use
Interleaved thinking — For agentic workflows, Claude can think between tool calls, not just before the final response

This interleaved thinking is a key advantage for building agents. Previously, with a fixed budget, Claude might exhaust its thinking tokens before completing a multi-step tool-using task. Adaptive thinking allocates tokens across the entire conversation flow.

How to Use Adaptive Thinking in the API

Using adaptive thinking is straightforward. Set thinking.type to "adaptive" in your API request. Here's a complete Python example:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    messages=[
        {
            "role": "user",
            "content": "Explain why the sum of two even numbers is always even."
        }
    ]
)
for block in response.content:
    if block.type == "thinking":
        print(f"\nThinking: {block.thinking}")
    elif block.type == "text":
        print(f"\nResponse: {block.text}")

And here's the equivalent in TypeScript:

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 16000,
  thinking: { type: 'adaptive' },
  messages: [
    {
      role: 'user',
      content: 'Explain why the sum of two even numbers is always even.'
    }
  ]
});
for (const block of response.content) {
  if (block.type === 'thinking') {
    console.log(\nThinking: ${block.thinking});
  } else if (block.type === 'text') {
    console.log(\nResponse: ${block.text});
  }
}

No beta headers are required — adaptive thinking is generally available for supported models.

Controlling Thinking Depth with the Effort Parameter

Adaptive thinking comes with an optional effort parameter that acts as a soft guide for how much thinking Claude should do. The effort parameter accepts one of three values:

"low" — Minimal thinking; Claude may skip thinking for simple problems
"medium" — Balanced thinking
"high" — Maximum thinking (default); Claude almost always thinks

Here's how to use it:

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={
        "type": "adaptive",
        "effort": "medium"  # or "low", "high"
    },
    messages=[
        {
            "role": "user",
            "content": "Write a Python function to merge two sorted lists."
        }
    ]
)

When to Use Each Effort Level

Effort	Best For	Trade-off
`low`	High-volume, simple queries (e.g., classification, extraction)	May miss nuance on borderline cases
`medium`	General-purpose tasks with moderate complexity	Good balance of speed and depth
`high`	Complex reasoning, math, code generation, agentic workflows	Higher token usage, slower response

Adaptive Thinking for Agentic Workflows

One of the biggest benefits of adaptive thinking is its support for interleaved thinking in agentic workflows. When you combine adaptive thinking with tools, Claude can think between tool calls — not just before the final answer.

Consider a multi-step task like:

Search a database for customer records
Analyze the records for patterns
Generate a summary report

With a fixed thinking budget, Claude might run out of tokens after step 1. With adaptive thinking, it dynamically allocates reasoning tokens across all steps, thinking just enough before each tool call.

Here's a simplified example:

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={"type": "adaptive", "effort": "high"},
    tools=[
        {
            "name": "search_database",
            "description": "Search customer records",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            }
        },
        {
            "name": "generate_report",
            "description": "Generate a summary report",
            "input_schema": {
                "type": "object",
                "properties": {
                    "data": {"type": "string"}
                },
                "required": ["data"]
            }
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "Find all customers who churned last quarter and create a report."
        }
    ]
)

Migrating from Manual Budget Tokens

If you're currently using thinking: {type: "enabled", budget_tokens: 16000}, here's how to migrate:

Identify your model — Check if it supports adaptive thinking (Opus 4.6+, Sonnet 4.6+, or Mythos Preview)
Replace the thinking block — Change {"type": "enabled", "budget_tokens": N} to {"type": "adaptive"}
Optionally add effort — If you previously used a high budget, start with "effort": "high"
Test and compare — Run your existing test suite to verify quality hasn't degraded

Before (deprecated):

thinking={"type": "enabled", "budget_tokens": 16000}

After (recommended):

thinking={"type": "adaptive", "effort": "high"}

Best Practices

Start with default effort — Use "effort": "high" (or omit it) for most workloads, then dial down if you need faster responses
Monitor token usage — Adaptive thinking can sometimes use more tokens than a fixed budget on complex queries, but it saves on simple ones
Combine with prompt caching — For multi-turn conversations, use prompt caching to reduce latency and costs
Test with your data — Run A/B tests comparing adaptive vs. fixed budget on your specific use case
Use for agentic tasks — Adaptive thinking shines in multi-step tool-using workflows; prefer it over fixed budgets for agents

Common Pitfalls

Assuming zero thinking — Even with "effort": "low", Claude may still think for complex queries
Ignoring model compatibility — Older models (Sonnet 4.5, Opus 4.5) don't support adaptive thinking; check the model name
Setting max_tokens too low — Adaptive thinking can use significant tokens; ensure max_tokens is high enough to accommodate both thinking and response

Key Takeaways

Adaptive thinking lets Claude dynamically decide when and how much to think, replacing rigid budget_tokens for smarter resource allocation.
It's the only supported mode on Claude Opus 4.7 and the recommended mode on Opus 4.6 and Sonnet 4.6; manual enabled is deprecated.
The effort parameter (low, medium, high) gives you soft control over thinking depth without hard token limits.
Interleaved thinking makes adaptive mode ideal for agentic workflows where Claude reasons between tool calls.
Migrate now if you're using budget_tokens on supported models — the old syntax will be removed in future releases.