BeClaude
Guide2026-04-30

Mastering Claude AI: A Practical Guide to Extended Thinking, Tools, and Structured Outputs

Learn how to leverage Claude AI's advanced features like extended thinking, tool use, structured outputs, and prompt caching with practical code examples and best practices.

Quick Answer

This guide teaches you how to use Claude AI's most powerful features—extended thinking, tool integration, structured outputs, and prompt caching—with step-by-step code examples and actionable tips for real-world applications.

Claude AIAPIExtended ThinkingTool UseStructured Outputs

Introduction

Claude AI has rapidly evolved from a simple conversational model into a sophisticated platform capable of extended reasoning, autonomous tool use, and structured data generation. Whether you're building a customer support bot, a research assistant, or a code generation pipeline, understanding these advanced features is essential for unlocking Claude's full potential.

This guide walks you through the most impactful capabilities available in the Claude API ecosystem, with practical code examples and best practices drawn from the official Anthropic documentation. By the end, you'll be able to implement extended thinking, integrate tools, enforce structured outputs, and optimize performance with prompt caching.

Extended Thinking: Beyond Simple Q&A

Extended thinking allows Claude to reason through complex problems step-by-step before generating a final answer. This is particularly useful for math, logic, multi-step planning, and any task requiring deep analysis.

How It Works

When you enable extended thinking, Claude produces a hidden chain-of-thought before the visible response. This internal reasoning is not returned to the user but informs the final output, improving accuracy and coherence.

Enabling Extended Thinking in the API

To activate extended thinking, set the thinking parameter in your API request:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, thinking={ "type": "enabled", "budget_tokens": 2048 # Tokens allocated for thinking }, messages=[ {"role": "user", "content": "Solve this equation step-by-step: 3x + 7 = 22"} ] )

print(response.content[0].text)

Adaptive Thinking (Beta)

For dynamic workloads, you can use adaptive thinking, which automatically adjusts the thinking budget based on task complexity:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
    thinking={
        "type": "enabled",
        "budget_tokens": 4096,
        "adaptive": True  # Automatically scales thinking budget
    },
    messages=[
        {"role": "user", "content": "Design a microservices architecture for an e-commerce platform"}
    ]
)

Best Practices

  • Set a realistic budget: Allocate 20-50% of your max_tokens to thinking for complex tasks.
  • Use adaptive thinking for varied workloads to avoid over- or under-allocating tokens.
  • Combine with tools: Extended thinking works seamlessly with tool use, allowing Claude to reason about which tool to call and how to interpret results.

Tool Use: Giving Claude Superpowers

Tools extend Claude's capabilities beyond text generation. You can define custom functions, integrate APIs, or use built-in tools like web search and code execution.

Defining a Custom Tool

Here's how to define a weather lookup tool and let Claude decide when to call it:

import anthropic

client = anthropic.Anthropic()

tools = [ { "name": "get_weather", "description": "Get the current weather for a given city", "input_schema": { "type": "object", "properties": { "city": { "type": "string", "description": "The city name, e.g., San Francisco" } }, "required": ["city"] } } ]

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools, messages=[ {"role": "user", "content": "What's the weather like in Tokyo?"} ] )

Check if Claude wants to use a tool

if response.stop_reason == "tool_use": tool_call = response.content[0] print(f"Calling tool: {tool_call.name}") print(f"Arguments: {tool_call.input}")

Handling Tool Calls

When Claude requests a tool, you must execute it and return the result:

def get_weather(city):
    # Simulate API call
    return {"temperature": 22, "condition": "sunny", "city": city}

if response.stop_reason == "tool_use": tool_call = response.content[0] if tool_call.name == "get_weather": result = get_weather(tool_call.input["city"]) # Send result back to Claude final_response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools, messages=[ {"role": "user", "content": "What's the weather like in Tokyo?"}, {"role": "assistant", "content": response.content}, {"role": "user", "content": [{"type": "tool_result", "tool_use_id": tool_call.id, "content": str(result)}]} ] ) print(final_response.content[0].text)

Built-in Tools

Claude provides several pre-built tools for common tasks:

  • Web search tool: Retrieve real-time information from the internet.
  • Code execution tool: Run Python code in a sandboxed environment.
  • Computer use tool: Interact with virtual desktops for automation.
  • File operations: Read, write, and manipulate files.

Structured Outputs: Enforcing Data Formats

When you need Claude to return data in a specific format (e.g., JSON, XML), structured outputs ensure compliance without post-processing.

Using JSON Mode

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Extract the name, age, and email from this text: 'John Doe, 34, [email protected]'"}
    ],
    response_format={"type": "json_object"}
)

import json data = json.loads(response.content[0].text) print(data) # {'name': 'John Doe', 'age': 34, 'email': '[email protected]'}

Strict Mode with Schema Validation

For production systems, define a JSON schema to enforce exact field types and constraints:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Create a product listing for a wireless mouse"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "product",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "price": {"type": "number"},
                    "category": {"type": "string", "enum": ["electronics", "clothing", "food"]},
                    "in_stock": {"type": "boolean"}
                },
                "required": ["name", "price", "category", "in_stock"]
            }
        }
    }
)

Prompt Caching: Speed and Cost Optimization

Prompt caching reduces latency and costs by reusing processed prompts across multiple requests. This is ideal for system prompts, few-shot examples, or large context documents.

Enabling Caching

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant with knowledge of our product catalog.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Tell me about your wireless mice"}
    ]
)

Check cache hit

print(response.usage.cache_creation_input_tokens) # First request print(response.usage.cache_read_input_tokens) # Subsequent requests

Best Practices

  • Cache system prompts that are identical across many conversations.
  • Cache large context blocks (e.g., product manuals, codebases) that are reused.
  • Monitor cache metrics in the response to verify effectiveness.

Putting It All Together: A Practical Example

Let's build a research assistant that uses extended thinking, web search, and structured outputs:

import anthropic
import json

client = anthropic.Anthropic()

tools = [ { "name": "web_search", "description": "Search the web for current information", "input_schema": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"] } } ]

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, thinking={"type": "enabled", "budget_tokens": 2048}, tools=tools, messages=[ {"role": "user", "content": "Research the latest developments in quantum computing and provide a structured summary with key findings, companies involved, and timeline."} ], response_format={ "type": "json_schema", "json_schema": { "name": "research_summary", "strict": True, "schema": { "type": "object", "properties": { "key_findings": {"type": "array", "items": {"type": "string"}}, "companies": {"type": "array", "items": {"type": "string"}}, "timeline": {"type": "array", "items": {"type": "string"}} }, "required": ["key_findings", "companies", "timeline"] } } } )

print(json.loads(response.content[0].text))

Key Takeaways

  • Extended thinking dramatically improves Claude's performance on complex reasoning tasks—enable it with a thoughtful token budget for math, logic, and planning.
  • Tool use lets Claude interact with external systems; define clear schemas and always handle tool calls in your application loop.
  • Structured outputs eliminate parsing errors in production—use JSON schema with strict mode for reliable data extraction.
  • Prompt caching reduces costs by 50-90% for repeated system prompts or large context blocks—always cache static content.
  • Combine features for maximum impact: extended thinking + tools + structured outputs creates powerful autonomous agents.
By mastering these capabilities, you can build Claude-powered applications that are not only smarter but also more efficient and reliable. Start experimenting with the code examples above, and refer to the official Anthropic documentation for the latest updates and advanced configurations.