BeClaude
Guide2026-04-29

Mastering Claude’s Extended Thinking: A Practical Guide to Adaptive Thinking, Effort, and Task Budgets

Learn how to use Claude’s Extended Thinking, Adaptive Thinking, Effort, and Task Budgets to control reasoning depth, speed, and cost in your API applications.

Quick Answer

This guide explains how to configure Claude’s Extended Thinking modes—Adaptive Thinking, Effort, and Task Budgets—to balance reasoning depth, response speed, and API costs. You’ll get practical code examples and best practices for each setting.

Extended ThinkingAdaptive ThinkingTask BudgetsClaude APIPrompt Engineering

Introduction

Claude’s Extended Thinking capabilities allow you to control how deeply the model reasons before generating a response. Whether you need a quick answer, a deeply reasoned analysis, or a budget-conscious solution, understanding these settings is essential for building efficient, cost-effective applications.

In this guide, you’ll learn:

  • What Extended Thinking is and when to use it
  • How Adaptive Thinking dynamically adjusts reasoning depth
  • How to set Effort levels for predictable reasoning
  • How Task Budgets (beta) cap thinking time and cost
  • Practical code examples in Python and TypeScript
Let’s dive in.

What Is Extended Thinking?

Extended Thinking is a Claude API feature that lets the model spend more computational resources on reasoning before producing a final answer. This is especially useful for complex tasks like:

  • Multi-step math problems
  • Code generation and debugging
  • Legal or medical analysis
  • Long-form content creation
By default, Claude uses a standard reasoning path. With Extended Thinking, you can instruct the model to “think longer” or “think deeper,” improving accuracy at the cost of higher latency and token usage.

Adaptive Thinking: Dynamic Reasoning Depth

Adaptive Thinking is the most flexible mode. Claude automatically decides how much thinking time to allocate based on the complexity of the input. You don’t set a fixed budget—Claude adjusts on the fly.

When to Use Adaptive Thinking

  • You want the best balance of speed and accuracy
  • Your use case has variable complexity (e.g., a chatbot that handles both simple FAQs and complex troubleshooting)
  • You don’t want to manually tune effort levels

Python Example

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, thinking={ "type": "enabled", "budget_tokens": 1024 }, messages=[ {"role": "user", "content": "Solve this equation step by step: 3x^2 + 5x - 2 = 0"} ] )

print(response.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 1024, thinking: { type: 'enabled', budget_tokens: 1024 }, messages: [ { role: 'user', content: 'Solve this equation step by step: 3x^2 + 5x - 2 = 0' } ] });

console.log(response.content[0].text);

Note: When thinking is enabled, you must set budget_tokens to a value less than or equal to max_tokens. The thinking budget defines how many tokens Claude can use for internal reasoning.

Effort: Predictable Reasoning Control

Effort lets you explicitly set how much reasoning Claude should apply. Options include low, medium, and high. This is ideal when you know the complexity of your task in advance.

Effort LevelUse Case
lowSimple Q&A, quick lookups
mediumBalanced reasoning for most tasks
highComplex analysis, multi-step reasoning

Python Example

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048,
        "effort": "high"
    },
    messages=[
        {"role": "user", "content": "Explain the implications of quantum computing on modern cryptography."}
    ]
)

TypeScript Example

const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  thinking: {
    type: 'enabled',
    budget_tokens: 2048,
    effort: 'high'
  },
  messages: [
    { role: 'user', content: 'Explain the implications of quantum computing on modern cryptography.' }
  ]
});

Task Budgets (Beta): Cost and Time Caps

Task Budgets are a beta feature that let you set a maximum thinking time or token budget. If Claude exceeds the budget, it returns a partial response or a fallback. This is useful for:

  • Keeping API costs predictable
  • Ensuring low-latency responses
  • Handling user-facing applications where response time matters

How It Works

  • Set budget_tokens to the maximum tokens Claude can use for thinking
  • If the budget is reached, Claude stops thinking and returns its best answer so far
  • You can also set a time_budget in milliseconds (if supported by your model)

Python Example

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    thinking={
        "type": "enabled",
        "budget_tokens": 512,  # Strict cap on thinking tokens
        "effort": "medium"
    },
    messages=[
        {"role": "user", "content": "Write a detailed essay on the history of the Roman Empire."}
    ]
)

TypeScript Example

const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 2048,
  thinking: {
    type: 'enabled',
    budget_tokens: 512,
    effort: 'medium'
  },
  messages: [
    { role: 'user', content: 'Write a detailed essay on the history of the Roman Empire.' }
  ]
});

Fast Mode (Beta: Research Preview)

Fast Mode is a research preview feature that trades some reasoning depth for significantly faster responses. It’s ideal for real-time applications where speed is critical, such as:

  • Live chat support
  • Code autocompletion
  • Interactive tutoring
To enable Fast Mode, set thinking.type to "fast" instead of "enabled".

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    thinking={
        "type": "fast",
        "budget_tokens": 256
    },
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

Best Practices

  • Start with Adaptive Thinking – It’s the easiest way to get good results without manual tuning.
  • Use Effort for known complexity – If you know a task is hard, set effort to high. If it’s simple, use low to save tokens.
  • Set Task Budgets for cost control – Always set a budget_tokens value that aligns with your cost tolerance.
  • Monitor token usage – Use the usage field in the API response to track thinking tokens vs. output tokens.
  • Test with Fast Mode – If latency is a problem, try Fast Mode and evaluate if the quality drop is acceptable.

Common Pitfalls

  • Forgetting to set budget_tokens – Without it, thinking mode may consume excessive tokens.
  • Setting budget_tokens higher than max_tokens – This will cause an API error. Always ensure budget_tokens <= max_tokens.
  • Using high effort for simple tasks – This wastes tokens and increases latency without improving quality.
  • Not handling partial responses – When using Task Budgets, always check if the response is complete.

Key Takeaways

  • Extended Thinking gives you fine-grained control over Claude’s reasoning depth, speed, and cost.
  • Adaptive Thinking is the easiest starting point—Claude automatically adjusts reasoning effort.
  • Effort levels (low, medium, high) let you explicitly control reasoning for predictable tasks.
  • Task Budgets (beta) cap thinking tokens or time, ensuring predictable costs and latency.
  • Fast Mode (research preview) prioritizes speed over depth for real-time applications.
  • Always set budget_tokens and monitor usage to avoid unexpected API costs.
By mastering these settings, you can build Claude-powered applications that are both intelligent and cost-efficient.