GuideBeginnerAgents2026-05-22

Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management

Explore Claude's API surface: model capabilities, tools, context management, and file handling. Learn to build powerful AI applications with practical code examples.

Quick Answer

This guide covers Claude's five API areas—model capabilities, tools, tool infrastructure, context management, and file handling—with practical examples for building production-ready applications.

Claude APIExtended ThinkingTool UseContext ManagementPrompt Caching

Introduction

Claude's API is designed to be both powerful and flexible, giving developers fine-grained control over how the model reasons, responds, and interacts with external systems. Whether you're building a simple chatbot or a complex agent that browses the web, executes code, and manages long-running conversations, understanding the API's surface is essential.

This guide walks you through the five core areas of the Claude API: model capabilities, tools, tool infrastructure, context management, and files and assets. You'll learn how each area works, when to use it, and see practical code examples you can adapt for your own projects.

1. Model Capabilities: Steering Claude's Reasoning

Claude's model capabilities let you control how the model reasons, formats responses, and handles input. The most impactful capabilities include extended thinking, structured outputs, and adaptive thinking.

Extended Thinking

Extended thinking allows Claude to "think" before responding, producing better results on complex tasks like math, coding, and analysis. You enable it with the thinking parameter:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    thinking={"type": "enabled", "budget_tokens": 4096},
    messages=[
        {"role": "user", "content": "Solve this step by step: 23 * 47 + 156 / 12"}
    ]
)
print(response.content[0].text)

Adaptive Thinking (Recommended for Opus 4.7)

Adaptive thinking lets Claude dynamically decide when and how much to think. Use the effort parameter to control depth:

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=2048,
    thinking={"type": "enabled", "budget_tokens": 8192},
    effort="high",  # Options: low, medium, high
    messages=[
        {"role": "user", "content": "Analyze the pros and cons of quantum computing for cryptography."}
    ]
)

Structured Outputs

For applications that need consistent JSON responses, use structured outputs:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Extract the name, date, and amount from this invoice: 'Invoice #1234 - Acme Corp - $5,000 - Due 2025-06-01'"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "invoice_extraction",
            "schema": {
                "type": "object",
                "properties": {
                    "company": {"type": "string"},
                    "amount": {"type": "number"},
                    "due_date": {"type": "string"}
                },
                "required": ["company", "amount", "due_date"]
            }
        }
    }
)

2. Tools: Letting Claude Take Action

Tools extend Claude's capabilities beyond text generation. Claude can call functions, browse the web, execute code, and interact with your environment.

Defining Custom Tools

You define tools as JSON schemas. Claude decides when to call them:

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name, e.g., San Francisco"
                }
            },
            "required": ["location"]
        }
    }
]
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What's the weather like in Tokyo?"}
    ]
)
Check if Claude wants to use a tool
if response.stop_reason == "tool_use":
    tool_use = response.content[-1]
    print(f"Tool called: {tool_use.name}")
    print(f"Arguments: {tool_use.input}")

Built-in Tools

Claude offers several built-in tools you can enable with minimal configuration:

Web Search Tool: Let Claude search the internet for current information.
Code Execution Tool: Run Python code in a sandboxed environment.
File Editor Tool: Read, write, and edit files in your project.
Computer Use Tool: Let Claude control a virtual desktop.

# Enable web search
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[{"type": "web_search"}],
    messages=[
        {"role": "user", "content": "What are the latest AI research papers from May 2025?"}
    ]
)

3. Tool Infrastructure: Orchestration at Scale

When you have many tools or complex workflows, tool infrastructure handles discovery, routing, and orchestration.

Tool Runner (SDK)

The Tool Runner SDK simplifies building agents that use multiple tools. It handles the loop of calling Claude, executing tools, and returning results:

from anthropic import Anthropic
from anthropic.tools import ToolRunner
client = Anthropic()
Define your tools
async def get_stock_price(symbol: str) -> float:
    # Your implementation
    return 150.25
async def send_email(to: str, subject: str, body: str) -> bool:
    # Your implementation
    return True
Create a runner
runner = ToolRunner(
    client=client,
    model="claude-sonnet-4-20250514",
    tools=[get_stock_price, send_email]
)
Run the agent
result = await runner.run(
    messages=[{"role": "user", "content": "Get the stock price of AAPL and email it to me."}]
)

Strict Tool Use

For deterministic behavior, enforce that Claude must use a specific tool:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    tool_choice={"type": "any"},  # Claude must use at least one tool
    messages=[
        {"role": "user", "content": "Search the web for recent AI news."}
    ]
)

4. Context Management: Keeping Conversations Efficient

Long-running sessions require careful context management to stay within token limits and control costs.

Context Windows

Claude supports up to 1 million tokens of context, enough for processing entire codebases or lengthy documents. Use the max_tokens parameter to control response length:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": "Summarize this 500-page document..."}
    ]
)

Prompt Caching

Reduce costs and latency by caching frequently used context (like system prompts or document snippets):

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant specialized in Python programming.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Explain Python decorators."}
    ]
)
Check cache metrics
print(f"Cache creation: {response.model_dump().get('usage', {}).get('cache_creation_input_tokens', 0)}")
print(f"Cache read: {response.model_dump().get('usage', {}).get('cache_read_input_tokens', 0)}")

Context Compaction

For very long conversations, compact the context to remove redundancy:

# After many turns, compact the conversation
compacted = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Please summarize our conversation so far and continue from there."}
    ]
)

5. Files and Assets: Working with Documents

Claude can process various file types, including PDFs, images, and text documents.

PDF Support

Upload PDFs for analysis, summarization, or extraction:

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize the key findings from this report."
                }
            ]
        }
    ]
)

Image and Vision Support

Claude can analyze images for visual question answering:

with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "What does this chart show?"
                }
            ]
        }
    ]
)

Feature Availability & Lifecycle

Features on the Claude Platform follow a lifecycle:

Beta: Preview features for feedback. May change significantly. Not for production.
Generally Available (GA): Stable, fully supported, recommended for production.
Deprecated: Still functional but migration path provided.
Retired: No longer available.

Check the Claude API documentation for the current status of each feature on your platform (Anthropic API, AWS Bedrock, Vertex AI, or Microsoft Foundry).

Key Takeaways

Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files/assets. Start with capabilities and tools, then optimize with context management.
Extended thinking and adaptive thinking dramatically improve performance on complex reasoning tasks. Use effort parameter for fine-grained control.
Tools extend Claude beyond text: custom functions, web search, code execution, and computer use. The Tool Runner SDK simplifies multi-tool agents.
Context management is critical for production: use prompt caching to reduce costs, compact long conversations, and leverage 1M token context windows for large documents.
Feature lifecycle matters: always check if a feature is GA before using it in production. Beta features may have breaking changes.