Guide2026-04-29

Mastering Claude’s Extended Thinking: A Practical Guide to Adaptive Thinking, Effort, and Task Budgets

Learn how to use Claude’s Extended Thinking, Adaptive Thinking, Effort, and Task Budgets to control reasoning depth, speed, and cost in your API applications.

Quick Answer

This guide explains how to configure Claude’s Extended Thinking modes—Adaptive Thinking, Effort, and Task Budgets—to balance reasoning depth, response speed, and API costs. You’ll get practical code examples and best practices for each setting.

Extended ThinkingAdaptive ThinkingTask BudgetsClaude APIPrompt Engineering

Introduction

Claude’s Extended Thinking capabilities allow you to control how deeply the model reasons before generating a response. Whether you need a quick answer, a deeply reasoned analysis, or a budget-conscious solution, understanding these settings is essential for building efficient, cost-effective applications.

In this guide, you’ll learn:

What Extended Thinking is and when to use it
How Adaptive Thinking dynamically adjusts reasoning depth
How to set Effort levels for predictable reasoning
How Task Budgets (beta) cap thinking time and cost
Practical code examples in Python and TypeScript

Let’s dive in.

What Is Extended Thinking?

Extended Thinking is a Claude API feature that lets the model spend more computational resources on reasoning before producing a final answer. This is especially useful for complex tasks like:

Multi-step math problems
Code generation and debugging
Legal or medical analysis
Long-form content creation

By default, Claude uses a standard reasoning path. With Extended Thinking, you can instruct the model to “think longer” or “think deeper,” improving accuracy at the cost of higher latency and token usage.

Adaptive Thinking: Dynamic Reasoning Depth

Adaptive Thinking is the most flexible mode. Claude automatically decides how much thinking time to allocate based on the complexity of the input. You don’t set a fixed budget—Claude adjusts on the fly.

When to Use Adaptive Thinking

You want the best balance of speed and accuracy
Your use case has variable complexity (e.g., a chatbot that handles both simple FAQs and complex troubleshooting)
You don’t want to manually tune effort levels

Python Example

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    thinking={
        "type": "enabled",
        "budget_tokens": 1024
    },
    messages=[
        {"role": "user", "content": "Solve this equation step by step: 3x^2 + 5x - 2 = 0"}
    ]
)
print(response.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  thinking: {
    type: 'enabled',
    budget_tokens: 1024
  },
  messages: [
    { role: 'user', content: 'Solve this equation step by step: 3x^2 + 5x - 2 = 0' }
  ]
});
console.log(response.content[0].text);

Note: When thinking is enabled, you must set budget_tokens to a value less than or equal to max_tokens. The thinking budget defines how many tokens Claude can use for internal reasoning.

Effort: Predictable Reasoning Control

Effort lets you explicitly set how much reasoning Claude should apply. Options include low, medium, and high. This is ideal when you know the complexity of your task in advance.

Effort Level	Use Case
low	Simple Q&A, quick lookups
medium	Balanced reasoning for most tasks
high	Complex analysis, multi-step reasoning

Python Example

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048,
        "effort": "high"
    },
    messages=[
        {"role": "user", "content": "Explain the implications of quantum computing on modern cryptography."}
    ]
)

TypeScript Example

const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  thinking: {
    type: 'enabled',
    budget_tokens: 2048,
    effort: 'high'
  },
  messages: [
    { role: 'user', content: 'Explain the implications of quantum computing on modern cryptography.' }
  ]
});

Task Budgets (Beta): Cost and Time Caps

Task Budgets are a beta feature that let you set a maximum thinking time or token budget. If Claude exceeds the budget, it returns a partial response or a fallback. This is useful for:

Keeping API costs predictable
Ensuring low-latency responses
Handling user-facing applications where response time matters

How It Works

Set budget_tokens to the maximum tokens Claude can use for thinking
If the budget is reached, Claude stops thinking and returns its best answer so far
You can also set a time_budget in milliseconds (if supported by your model)

Python Example

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    thinking={
        "type": "enabled",
        "budget_tokens": 512,  # Strict cap on thinking tokens
        "effort": "medium"
    },
    messages=[
        {"role": "user", "content": "Write a detailed essay on the history of the Roman Empire."}
    ]
)

TypeScript Example

const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 2048,
  thinking: {
    type: 'enabled',
    budget_tokens: 512,
    effort: 'medium'
  },
  messages: [
    { role: 'user', content: 'Write a detailed essay on the history of the Roman Empire.' }
  ]
});

Fast Mode (Beta: Research Preview)

Fast Mode is a research preview feature that trades some reasoning depth for significantly faster responses. It’s ideal for real-time applications where speed is critical, such as:

Live chat support
Code autocompletion
Interactive tutoring

To enable Fast Mode, set thinking.type to "fast" instead of "enabled".

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    thinking={
        "type": "fast",
        "budget_tokens": 256
    },
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

Best Practices

Start with Adaptive Thinking – It’s the easiest way to get good results without manual tuning.
Use Effort for known complexity – If you know a task is hard, set effort to high. If it’s simple, use low to save tokens.
Set Task Budgets for cost control – Always set a budget_tokens value that aligns with your cost tolerance.
Monitor token usage – Use the usage field in the API response to track thinking tokens vs. output tokens.
Test with Fast Mode – If latency is a problem, try Fast Mode and evaluate if the quality drop is acceptable.

Common Pitfalls

Forgetting to set budget_tokens – Without it, thinking mode may consume excessive tokens.
Setting budget_tokens higher than max_tokens – This will cause an API error. Always ensure budget_tokens <= max_tokens.
Using high effort for simple tasks – This wastes tokens and increases latency without improving quality.
Not handling partial responses – When using Task Budgets, always check if the response is complete.

Key Takeaways

Extended Thinking gives you fine-grained control over Claude’s reasoning depth, speed, and cost.
Adaptive Thinking is the easiest starting point—Claude automatically adjusts reasoning effort.
Effort levels (low, medium, high) let you explicitly control reasoning for predictable tasks.
Task Budgets (beta) cap thinking tokens or time, ensuring predictable costs and latency.
Fast Mode (research preview) prioritizes speed over depth for real-time applications.
Always set budget_tokens and monitor usage to avoid unexpected API costs.

By mastering these settings, you can build Claude-powered applications that are both intelligent and cost-efficient.