Guide2026-05-01

Mastering Claude’s Extended Thinking: A Practical Guide to Adaptive Thinking, Effort, and Fast Mode

Learn how to use Claude’s Extended Thinking, Adaptive Thinking, Task Budgets, and Fast Mode to control reasoning depth, speed, and cost in your API applications.

Quick Answer

This guide explains how to configure Claude’s Extended Thinking features—Adaptive Thinking, Task Budgets, and Fast Mode—to balance reasoning depth, response speed, and token cost in your API calls.

Extended ThinkingAdaptive ThinkingClaude APITask BudgetsFast Mode

Introduction

Claude’s Extended Thinking capabilities represent a significant leap forward in how you can control the model’s reasoning process. Whether you need deep, chain-of-thought analysis for complex problems or lightning-fast responses for simple queries, understanding these features is essential for building efficient, cost-effective applications.

In this guide, you’ll learn how to use Adaptive Thinking, Task Budgets, and Fast Mode to fine-tune Claude’s behavior. We’ll cover practical API examples, best practices, and real-world scenarios to help you get the most out of these tools.

What Is Extended Thinking?

Extended Thinking allows Claude to reason step-by-step before generating a final response. Instead of producing an answer immediately, the model can “think” internally, breaking down complex tasks into smaller parts. This is especially useful for:

Mathematical problem solving
Code generation and debugging
Multi-step reasoning tasks
Complex decision-making

By default, Claude uses a fixed thinking budget. But with the new features described below, you can dynamically adjust how much thinking happens—and how fast.

Adaptive Thinking: Let Claude Decide

Adaptive Thinking is a mode where Claude automatically determines how much thinking effort to apply based on the complexity of the input. This is the easiest way to get started because you don’t need to specify a budget manually.

How It Works

When you enable Adaptive Thinking, Claude analyzes the prompt and allocates thinking tokens accordingly. Simple questions get minimal thinking; complex ones get more. This saves tokens and reduces latency for straightforward tasks while still allowing deep reasoning when needed.

API Example (Python)

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    thinking={
        "type": "enabled",
        "budget_tokens": 1024,
        "adaptive": True  # Enable adaptive thinking
    },
    messages=[
        {"role": "user", "content": "What is the square root of 144?"}
    ]
)
print(response.content[0].text)

Note: When adaptive is set to True, the budget_tokens acts as a maximum ceiling. Claude will use fewer tokens if the task doesn’t require the full budget.

When to Use Adaptive Thinking

General-purpose chatbots – You don’t know the complexity of user queries in advance.
Cost-sensitive applications – You want to avoid overspending on simple requests.
Prototyping – Quickly test without fine-tuning budgets.

Task Budgets (Beta): Precise Control

Task Budgets let you specify exactly how many thinking tokens Claude should use for a given task. This is more rigid than Adaptive Thinking but gives you deterministic control over reasoning depth.

API Example (Python)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048,  # Fixed budget
        "adaptive": False
    },
    messages=[
        {"role": "user", "content": "Solve this equation step by step: 3x + 7 = 22"}
    ]
)

Best Practices for Task Budgets

Start small, then increase – Begin with a modest budget (e.g., 512 tokens) and monitor response quality.
Match budget to task complexity – Simple Q&A: 256–512 tokens. Multi-step reasoning: 1024–2048 tokens. Code generation: 2048+ tokens.
Combine with max_tokens – Always set max_tokens higher than your thinking budget to leave room for the final response.

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 2048,
  thinking: {
    type: 'enabled',
    budget_tokens: 1024,
    adaptive: false,
  },
  messages: [
    { role: 'user', content: 'Write a Python function to merge two sorted lists.' }
  ]
});
console.log(response.content[0].text);

Fast Mode (Beta: Research Preview)

Fast Mode is designed for scenarios where speed is more important than deep reasoning. When enabled, Claude minimizes thinking time, producing responses faster—but with potentially less thorough analysis.

How to Enable Fast Mode

Fast Mode is activated by setting a special parameter in the API call. Note that this is a research preview, so behavior may change.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    thinking={
        "type": "enabled",
        "budget_tokens": 512,
        "fast_mode": True  # Enable fast mode
    },
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

When to Use Fast Mode

Real-time applications – Chatbots requiring sub-second responses.
Simple, factual queries – No need for chain-of-thought.
High-throughput systems – Reduce latency per request.

Caution: Fast Mode may reduce accuracy on complex or ambiguous tasks. Always test with your specific use case.

Combining Features: A Practical Workflow

You can mix Adaptive Thinking, Task Budgets, and Fast Mode to create a tiered system:

Default path – Adaptive Thinking with a moderate budget (1024 tokens).
Simple queries – Fast Mode enabled, low budget (256 tokens).
Complex tasks – Fixed Task Budget (2048 tokens), no Fast Mode.

Example: Tiered Router

def get_thinking_config(complexity: str):
    if complexity == "simple":
        return {
            "type": "enabled",
            "budget_tokens": 256,
            "adaptive": False,
            "fast_mode": True
        }
    elif complexity == "medium":
        return {
            "type": "enabled",
            "budget_tokens": 1024,
            "adaptive": True,
            "fast_mode": False
        }
    else:  # complex
        return {
            "type": "enabled",
            "budget_tokens": 2048,
            "adaptive": False,
            "fast_mode": False
        }
Usage
config = get_thinking_config("medium")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=config["budget_tokens"] + 512,
    thinking=config,
    messages=[{"role": "user", "content": user_input}]
)

Monitoring and Debugging

When using Extended Thinking, you can inspect the thinking content in the API response. This is useful for debugging or auditing.

for block in response.content:
    if block.type == "thinking":
        print("Thinking:", block.thinking)
    elif block.type == "text":
        print("Response:", block.text)

Key Takeaways

Adaptive Thinking automatically adjusts reasoning depth based on input complexity, saving tokens and time on simple tasks.
Task Budgets give you precise control over thinking tokens, ideal for deterministic workflows.
Fast Mode prioritizes speed over depth, suitable for real-time or simple queries.
Combine these features in a tiered system to balance cost, speed, and accuracy.
Always test with your specific use case—especially Fast Mode, which may reduce accuracy on complex tasks.