BeClaude
Guide2026-05-01

Mastering Claude’s Extended Thinking: A Practical Guide to Adaptive Thinking, Effort, and Task Budgets

Learn how to use Claude’s extended thinking features—adaptive thinking, effort control, task budgets, and fast mode—to optimize reasoning depth, cost, and speed in your API applications.

Quick Answer

This guide explains how to configure Claude’s extended thinking capabilities—adaptive thinking, effort levels, task budgets, and fast mode—to control reasoning depth, token usage, and response speed in your API calls.

extended thinkingadaptive thinkingtask budgetsClaude APIreasoning

Claude’s extended thinking capabilities represent a significant leap forward in how you can control the model’s reasoning process. Whether you need deep, chain-of-thought analysis for complex problems or lightning-fast responses for simple queries, understanding these features is essential for getting the most out of the Claude API.

In this guide, you’ll learn how to use adaptive thinking, effort control, task budgets, and fast mode to fine-tune Claude’s reasoning behavior. We’ll cover practical API examples, best practices, and real-world use cases.

What Is Extended Thinking?

Extended thinking allows Claude to allocate additional computational resources—specifically, more tokens and reasoning steps—to produce deeper, more accurate responses. Instead of generating an answer in a single pass, Claude can “think” through a problem step by step, considering alternatives, checking its own logic, and refining its output.

This is especially valuable for:

  • Complex mathematical or logical reasoning
  • Multi-step planning and analysis
  • Code generation with intricate requirements
  • Research and document synthesis

Adaptive Thinking: Let Claude Decide the Depth

Adaptive thinking is the simplest way to enable extended reasoning. You tell Claude to think as much as it needs, and it automatically determines the appropriate depth based on the complexity of the prompt.

How to Enable Adaptive Thinking

In the API, you enable extended thinking by setting the thinking parameter in your request. Here’s a Python example:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=8192, thinking={ "type": "enabled", "budget_tokens": 4096 # Maximum tokens Claude can use for thinking }, messages=[ {"role": "user", "content": "Solve this equation step by step: 3x^2 + 5x - 2 = 0"} ] )

The response includes both thinking and final content

print(response.content[0].thinking) # Claude's reasoning steps print(response.content[1].text) # Final answer
Note: When thinking is enabled, the response contains two content blocks: the first is the thinking (visible in the API but not to end users), and the second is the final text output.

When to Use Adaptive Thinking

Use adaptive thinking when you want maximum quality without manual tuning. It’s ideal for:

  • Open-ended questions where complexity is unknown
  • Applications where accuracy matters more than speed
  • Prototyping and experimentation

Effort: Fine-Grained Control Over Reasoning Depth

Effort gives you explicit control over how much Claude thinks. You can set it to low, medium, or high to balance between speed and depth.

Effort Levels Explained

Effort LevelBehaviorUse Case
lowMinimal reasoning; fast responsesSimple Q&A, classification, extraction
mediumBalanced reasoningGeneral-purpose tasks, moderate complexity
highDeep, thorough reasoningComplex analysis, code generation, research

API Example with Effort

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 4096, thinking: { type: 'enabled', budget_tokens: 2048, effort: 'high' // Explicitly set effort level }, messages: [ { role: 'user', content: 'Write a Python function to merge two sorted lists efficiently.' } ] });

console.log(response.content[0].thinking); console.log(response.content[1].text);

Best Practices for Effort

  • Use low effort for high-throughput applications like chatbots or simple data extraction.
  • Use medium effort as your default for most tasks.
  • Use high effort only when you need the highest accuracy and can tolerate longer response times.

Task Budgets (Beta): Allocate Thinking Tokens Precisely

Task budgets allow you to set a specific token budget for thinking, separate from the output tokens. This gives you fine-grained control over cost and reasoning depth.

How Task Budgets Work

The budget_tokens parameter in the thinking object defines the maximum number of tokens Claude can use for its internal reasoning. The max_tokens parameter still controls the final output length.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,  # Final output limit
    thinking={
        "type": "enabled",
        "budget_tokens": 8192  # Thinking limit (can be larger than max_tokens)
    },
    messages=[
        {"role": "user", "content": "Explain the proof of Fermat's Last Theorem in simple terms."}
    ]
)

Key Considerations

  • Budget tokens can exceed max_tokens: Claude may think more than it writes.
  • Cost is based on total tokens used: Both thinking and output tokens count toward your usage.
  • Minimum budget: You must set at least 1024 budget tokens when enabling thinking.

Fast Mode (Beta: Research Preview): Speed Without Thinking

Fast mode is the opposite of extended thinking. It disables the thinking process entirely, forcing Claude to respond as quickly as possible. This is useful for real-time applications where latency is critical.

Enabling Fast Mode

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    thinking={
        "type": "disabled"  # Explicitly disable thinking
    },
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)
Note: Fast mode is currently in research preview. It may not be available on all models or regions.

When to Use Fast Mode

  • Simple factual queries
  • High-traffic chatbots
  • Real-time translation or transcription
  • Any application where sub-second response time is required

Combining Features: A Practical Workflow

Here’s a complete example that demonstrates how to use all these features together in a single application:

import anthropic

client = anthropic.Anthropic()

def get_claude_response(prompt, complexity="medium"): """ Get a response from Claude with dynamic thinking configuration. """ # Configure thinking based on complexity if complexity == "low": thinking_config = {"type": "disabled"} # Fast mode max_tokens = 1024 elif complexity == "medium": thinking_config = { "type": "enabled", "budget_tokens": 2048, "effort": "medium" } max_tokens = 2048 else: # high thinking_config = { "type": "enabled", "budget_tokens": 8192, "effort": "high" } max_tokens = 4096 response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=max_tokens, thinking=thinking_config, messages=[{"role": "user", "content": prompt}] ) return response

Example usage

simple_response = get_claude_response("What is 2+2?", "low") complex_response = get_claude_response( "Analyze the economic impact of quantum computing on cryptography.", "high" )

Troubleshooting Common Issues

Thinking Content Not Visible

If you don’t see thinking content in the response, ensure:
  • You’re using a model that supports extended thinking (Claude 3.5 Sonnet, Claude 4 Sonnet, or later)
  • The thinking parameter is properly set to {"type": "enabled"}
  • You’re accessing response.content[0].thinking (not .text)

Budget Exceeded

If Claude stops thinking before reaching a conclusion, increase budget_tokens. A good rule of thumb is to set it to 50-100% of your max_tokens value.

High Latency

If responses are too slow:
  • Reduce effort to low or medium
  • Lower budget_tokens
  • Consider using fast mode (thinking: {"type": "disabled"})

Key Takeaways

  • Adaptive thinking lets Claude automatically determine reasoning depth, making it ideal for general-purpose use.
  • Effort control (low, medium, high) gives you explicit control over the speed-accuracy trade-off.
  • Task budgets allow precise allocation of thinking tokens, separate from output tokens.
  • Fast mode disables thinking entirely for maximum speed in simple, high-throughput scenarios.
  • Always test different configurations with your specific use case to find the optimal balance of cost, speed, and quality.