BeClaude
Guide2026-04-25

Mastering Claude’s Extended Thinking: Adaptive Thinking, Effort Budgets, and Fast Mode

A practical guide to Claude's extended thinking features—adaptive thinking, effort budgets, and fast mode—with code examples and best practices for advanced AI reasoning.

Quick Answer

Learn how to configure Claude's extended thinking capabilities, including adaptive thinking, effort budgets, and fast mode, to control reasoning depth, token usage, and response speed in your applications.

Extended ThinkingAdaptive ThinkingEffort BudgetsFast ModeClaude API

Mastering Claude’s Extended Thinking: Adaptive Thinking, Effort Budgets, and Fast Mode

Claude’s extended thinking capabilities represent a significant leap forward in AI reasoning. Whether you’re building a complex code analysis tool, a multi-step research assistant, or a creative writing partner, understanding how to control Claude’s “thinking” process can dramatically improve output quality, cost efficiency, and user experience.

In this guide, we’ll dive into three powerful features: Adaptive Thinking, Effort Budgets, and Fast Mode. You’ll learn what each does, when to use them, and how to implement them with practical code examples.

---

What Is Extended Thinking?

Extended thinking allows Claude to “think” before responding—allocating additional computational resources to reason through complex problems. Instead of generating a response token by token in a single pass, Claude can explore multiple reasoning paths, reconsider its approach, and produce more accurate, nuanced outputs.

This is especially valuable for:

  • Mathematical problem solving
  • Multi-step logical reasoning
  • Code generation and debugging
  • Complex analysis and research tasks
  • Creative writing requiring coherence over long passages
---

Adaptive Thinking: Let Claude Decide the Depth

Adaptive thinking is the most flexible mode. Instead of you manually setting a fixed thinking budget, Claude dynamically determines how much reasoning effort a task requires. It will use minimal thinking for simple queries and scale up for complex ones.

When to Use Adaptive Thinking

  • General-purpose applications where query complexity varies
  • Chatbots handling both simple FAQs and complex troubleshooting
  • Cost-sensitive deployments where you want to avoid over-spending on easy tasks

How to Enable Adaptive Thinking

In the API, you enable extended thinking and set type: "enabled" without specifying a budget. Claude will automatically decide the effort.

#### Python Example

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=4096, thinking={ "type": "enabled" # Adaptive mode }, messages=[ {"role": "user", "content": "Explain the Riemann Hypothesis in simple terms."} ] )

print(response.content)

#### TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-3-5-sonnet-20241022', max_tokens: 4096, thinking: { type: 'enabled' }, messages: [ { role: 'user', content: 'Explain the Riemann Hypothesis in simple terms.' } ] });

console.log(response.content);

Note: When using adaptive thinking, the max_tokens parameter still applies to the total response (thinking + visible output). Ensure you allocate enough tokens for both.

---

Effort Budgets (Beta): Fine-Tune Thinking Resources

Effort budgets give you precise control over how many tokens Claude can use for thinking. You set a budget_tokens value, and Claude will use up to that amount for internal reasoning before generating its visible response.

When to Use Effort Budgets

  • High-stakes tasks requiring maximum accuracy (e.g., medical diagnosis, legal analysis)
  • Cost predictability—you know exactly how much thinking will cost
  • Performance tuning—experimenting with different budget levels to find the sweet spot

How to Set an Effort Budget

Set type: "enabled" and provide a budget_tokens value. The budget must be less than max_tokens.

#### Python Example

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=8192,
    thinking={
        "type": "enabled",
        "budget_tokens": 4096  # Use up to 4096 tokens for thinking
    },
    messages=[
        {"role": "user", "content": "Write a Python function to solve the traveling salesman problem using dynamic programming."}
    ]
)

#### TypeScript Example

const response = await client.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 8192,
  thinking: {
    type: 'enabled',
    budget_tokens: 4096
  },
  messages: [
    { role: 'user', content: 'Write a Python function to solve the traveling salesman problem using dynamic programming.' }
  ]
});

Best Practices for Effort Budgets

  • Start small: Begin with 1024–2048 tokens and increase until output quality plateaus.
  • Monitor token usage: Use the API response’s usage field to see actual thinking tokens consumed.
  • Balance with max_tokens: Ensure max_tokens is at least 2x your budget to leave room for the visible response.
---

Fast Mode (Beta: Research Preview): Speed Over Depth

Fast mode is designed for scenarios where response speed is critical, and deep reasoning isn’t required. It reduces thinking overhead, producing faster responses—ideal for real-time applications.

When to Use Fast Mode

  • Real-time chat where users expect sub-second responses
  • Simple classification tasks (e.g., sentiment analysis, intent detection)
  • High-throughput systems where latency is a key metric

How to Enable Fast Mode

Set type: "enabled" and include fast_mode: true in the thinking configuration.

#### Python Example

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    thinking={
        "type": "enabled",
        "fast_mode": True  # Enable fast mode
    },
    messages=[
        {"role": "user", "content": "Classify this review as positive, negative, or neutral: 'The product was okay, but shipping took forever.'"}
    ]
)

#### TypeScript Example

const response = await client.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  thinking: {
    type: 'enabled',
    fast_mode: true
  },
  messages: [
    { role: 'user', content: "Classify this review as positive, negative, or neutral: 'The product was okay, but shipping took forever.'" }
  ]
});
Important: Fast mode is a research preview. It may produce less accurate results for complex tasks. Always test thoroughly before deploying to production.

---

Combining Features: A Practical Workflow

You can combine adaptive thinking with fast mode, or effort budgets with fast mode, depending on your needs. Here’s a decision flowchart:

  • Need maximum accuracy? → Use effort budget with high budget_tokens.
  • Need speed? → Use adaptive thinking + fast mode.
  • Need cost control? → Use adaptive thinking without fast mode.
  • Need predictable costs? → Use effort budget with a fixed budget_tokens.

Example: Adaptive + Fast Mode for a Customer Support Bot

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=2048,
    thinking={
        "type": "enabled",
        "fast_mode": True
    },
    messages=[
        {"role": "user", "content": "My order hasn't arrived yet. What should I do?"}
    ]
)

This configuration lets Claude decide the thinking depth while keeping responses snappy—perfect for a support chatbot.

---

Common Pitfalls and How to Avoid Them

PitfallSolution
Setting budget_tokens too high for simple tasksUse adaptive thinking instead
Forgetting to increase max_tokens when using effort budgetsSet max_tokens to at least budget_tokens + expected_output_length
Using fast mode for complex reasoningReserve fast mode for simple, latency-sensitive tasks
Not testing with real user queriesRun A/B tests comparing thinking modes on your actual data
---

Key Takeaways

  • Adaptive thinking lets Claude dynamically allocate reasoning resources—ideal for variable-complexity workloads and cost optimization.
  • Effort budgets give you precise control over thinking token usage, enabling predictable costs and maximum accuracy for critical tasks.
  • Fast mode prioritizes speed over depth, perfect for real-time applications where latency matters more than exhaustive reasoning.
  • Combine features strategically: Use adaptive + fast for speed, effort budgets for accuracy, and always test with real-world queries.
  • Monitor token usage via the API response to fine-tune your configuration and avoid unexpected costs.
By mastering these extended thinking features, you can build Claude-powered applications that are both intelligent and efficient—tailored exactly to your use case.