Guide2026-04-28

Mastering Claude’s Extended Thinking: A Practical Guide to Adaptive Thinking and Task Budgets

Learn how to use Claude’s Extended Thinking, Adaptive Thinking, and Task Budgets to improve reasoning depth, control costs, and handle complex tasks efficiently.

Quick Answer

This guide explains how to enable and optimize Claude’s Extended Thinking, Adaptive Thinking, and Task Budgets to balance reasoning depth, speed, and cost in your API applications.

Extended ThinkingAdaptive ThinkingTask BudgetsClaude APIReasoning

Introduction

Claude’s reasoning capabilities have evolved significantly. With the introduction of Extended Thinking, Adaptive Thinking, and Task Budgets, developers now have fine-grained control over how Claude processes complex problems. Whether you’re building a research assistant, a code analyzer, or a multi-step reasoning agent, understanding these features is critical to getting the best performance and cost-efficiency.

This guide walks you through each feature, explains when to use them, and provides practical code examples to integrate them into your Claude API workflows.

What Is Extended Thinking?

Extended Thinking allows Claude to allocate more tokens to its internal reasoning process before generating a final response. This is especially useful for tasks that require deep logical chains, mathematical proofs, or multi-step analysis.

By default, Claude uses a limited thinking budget. With Extended Thinking, you can set a higher budget, enabling the model to "think longer" and produce more accurate, nuanced answers.

When to Use Extended Thinking

Complex math or logic problems
Legal or contract analysis
Multi-step code generation or debugging
Research synthesis with many sources

How to Enable Extended Thinking

In the API, you enable Extended Thinking by setting the thinking parameter in your request. Here’s an example in Python:

import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048  # Tokens allocated for thinking
    },
    messages=[
        {"role": "user", "content": "Solve this equation step by step: 3x + 7 = 22"}
    ]
)
print(response.content[0].text)

Note: The budget_tokens value must be less than max_tokens. The thinking budget is consumed from the total token limit.

Adaptive Thinking: Let Claude Decide

Adaptive Thinking (available as a research preview) lets Claude dynamically decide how much thinking budget to use based on the complexity of the task. Instead of you setting a fixed budget, Claude estimates the required reasoning depth and allocates tokens accordingly.

This is ideal for:

Mixed workloads where some queries are simple and others are complex
Reducing costs on easy questions while maintaining depth on hard ones
Applications where you don’t want to manually tune budgets per request

Enabling Adaptive Thinking

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 4096,
        "adaptive": True  # Enable adaptive thinking
    },
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

When adaptive is set to True, Claude may use fewer tokens than the budget for simple queries, saving you money and latency.

Task Budgets: Fine-Tuning Effort

Task Budgets (beta) allow you to set a maximum thinking effort for a specific task. This is different from token budgets—it controls how much computational effort Claude applies, which can affect both reasoning depth and response time.

Use Task Budgets when:

You need consistent response times (e.g., for real-time applications)
You want to limit costs on non-critical tasks
You’re batching requests and need predictable throughput

Setting a Task Budget

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048,
        "task_budget": 0.5  # 0.0 to 1.0, where 1.0 is maximum effort
    },
    messages=[
        {"role": "user", "content": "Summarize this article in 3 bullet points."}
    ]
)

A task_budget of 0.5 means Claude will use roughly half the maximum effort, trading some depth for speed.

Fast Mode: Speed Over Depth

Fast mode (research preview) is designed for scenarios where latency is critical. It reduces thinking time and returns responses more quickly, at the cost of some reasoning quality.

Enable Fast Mode when:

Building chatbots that need sub-second responses
Handling simple Q&A or retrieval tasks
Prototyping and iterating quickly

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    thinking={
        "type": "enabled",
        "budget_tokens": 512,
        "fast_mode": True
    },
    messages=[
        {"role": "user", "content": "What is 2+2?"}
    ]
)

Combining Features for Optimal Results

You can combine Adaptive Thinking, Task Budgets, and Fast Mode to create a custom reasoning profile. For example:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={
        "type": "enabled",
        "budget_tokens": 4096,
        "adaptive": True,
        "task_budget": 0.8,
        "fast_mode": False
    },
    messages=[
        {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ]
)

This configuration gives Claude a high budget, adaptive allocation, and near-maximum effort, but disables fast mode for thorough reasoning.

Best Practices

Start with a moderate budget – For most tasks, 1024–2048 thinking tokens is sufficient. Increase only if you see shallow responses.
Use Adaptive Thinking for variable workloads – It saves tokens on simple queries and allocates more for complex ones.
Monitor token usage – Use the API response’s usage field to track thinking tokens and adjust budgets.
Test with Task Budgets – If response time is critical, set a task budget of 0.3–0.5 to balance speed and quality.
Avoid Fast Mode for reasoning-heavy tasks – It’s best for simple, repetitive queries.

Key Takeaways

Extended Thinking gives Claude more tokens to reason, improving accuracy on complex tasks.
Adaptive Thinking automatically adjusts the thinking budget based on task difficulty, saving costs.
Task Budgets let you control computational effort, balancing speed and depth.
Fast Mode prioritizes low latency over reasoning quality.
Combine these features strategically to optimize for your specific use case—whether that’s deep analysis, real-time chat, or cost-sensitive batch processing.