GuideBeginnerAgents2026-05-22

Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management

Explore the full Claude API surface—model capabilities, tools, context management, and files. Learn how to build powerful AI applications with practical code examples.

Quick Answer

This guide walks you through the five core areas of the Claude API: model capabilities, tools, tool infrastructure, context management, and file handling. You'll learn how to use extended thinking, structured outputs, citations, tool calling, prompt caching, and batch processing with practical Python examples.

Claude APIcontext managementtool useextended thinkingbatch processing

Introduction

The Claude API offers a rich surface area for building intelligent, production-ready applications. Whether you're creating a chatbot, an agent that browses the web, or a system that processes millions of documents, understanding the five core areas of the API is essential.

This guide covers:

Model capabilities – reasoning, structured outputs, citations
Tools – letting Claude act on the web or in your environment
Tool infrastructure – discovery and orchestration at scale
Context management – keeping long-running sessions efficient
Files and assets – managing documents and data

By the end, you'll know which features to use and when, and you'll have practical code snippets to get started.

1. Model Capabilities: Steering Claude's Output

Claude's model capabilities let you control how it reasons and formats responses. These are the building blocks for any application.

Extended Thinking and Adaptive Thinking

For complex reasoning tasks, Claude can "think" before responding. With Extended Thinking, you set a fixed thinking budget. With Adaptive Thinking (recommended for Opus 4.7), Claude decides how much to think dynamically.

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048
    },
    messages=[
        {"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) dx"}
    ]
)
Access the thinking block
for block in response.content:
    if block.type == "thinking":
        print("Thinking:", block.thinking)
    elif block.type == "text":
        print("Answer:", block.text)

Structured Outputs

Need JSON, YAML, or a specific schema? Use the structured_outputs feature to enforce response formats.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "List three planets and their moons as JSON"}
    ],
    structured_outputs={
        "type": "json_schema",
        "json_schema": {
            "name": "planets",
            "schema": {
                "type": "object",
                "properties": {
                    "planets": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "moons": {"type": "array", "items": {"type": "string"}}
                            },
                            "required": ["name", "moons"]
                        }
                    }
                },
                "required": ["planets"]
            }
        }
    }
)
print(response.content[0].text)

Citations

Ground Claude's responses in source documents. With Citations, Claude provides exact references to the source material.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Summarize the key findings from the attached report."}
    ],
    documents=[
        {
            "type": "document",
            "source": {
                "type": "text",
                "media_type": "text/plain",
                "data": "Q3 revenue grew 15% year-over-year to $2.1B. Operating margin improved to 22%."
            },
            "title": "Q3 Earnings Report",
            "context": "This is the company's quarterly earnings report.",
            "citations": {"enabled": True}
        }
    ]
)
print(response.content[0].text)
Output includes citations like [1] pointing to the source

2. Tools: Letting Claude Take Action

Tools extend Claude's capabilities beyond text generation. Claude can call functions, browse the web, execute code, and more.

Defining Tools

You define tools as JSON schemas. Claude decides when to call them.

def get_weather(location: str) -> str:
    """Get current weather for a location."""
    # In production, call a real weather API
    return f"The weather in {location} is sunny, 72°F."
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state, e.g., San Francisco, CA"
                    }
                },
                "required": ["location"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo?"}
    ]
)
Check if Claude wants to use a tool
for block in response.content:
    if block.type == "tool_use":
        print(f"Calling tool: {block.name}")
        print(f"Arguments: {block.input}")
        result = get_weather(block.input["location"])
        # Send result back to Claude...

Built-in Tools

Claude comes with several built-in tools:

Web search tool – search the internet
Web fetch tool – fetch content from URLs
Code execution tool – run Python code in a sandbox
Computer use tool – control a virtual desktop
Bash tool – run shell commands
Memory tool – store and retrieve information across sessions

Parallel Tool Use

Claude can call multiple tools simultaneously for efficiency.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[weather_tool, stock_tool, news_tool],
    parallel_tool_use=True,
    messages=[
        {"role": "user", "content": "What's the weather in London, the stock price of AAPL, and today's top news?"}
    ]
)

3. Tool Infrastructure: Discovery and Orchestration

When building complex agents, you need more than just tool definitions. The Claude API provides infrastructure for:

Tool Runner (SDK) – automatically handles tool call loops
Strict tool use – force Claude to use specific tools
Tool search – let Claude discover tools dynamically
Fine-grained tool streaming – stream tool calls and results
Tool combinations – define workflows that chain tools together

Tool Runner Example

from anthropic import Anthropic
from anthropic.types import ToolUseBlock
client = Anthropic()
Define a simple tool
weather_tool = {
    "name": "get_weather",
    "description": "Get weather for a location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string"}
        },
        "required": ["location"]
    }
}
Use the Tool Runner (conceptual - actual implementation may vary)
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[weather_tool],
    tool_choice={"type": "auto"},
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ]
)
The SDK can automatically handle the tool call loop
See the Tool Runner documentation for details

4. Context Management: Keeping Sessions Efficient

Long conversations or large documents require careful context management. Claude provides several features:

Context Windows

Claude supports up to 1 million tokens of context. This allows processing entire codebases, lengthy books, or hours of conversation.

Prompt Caching

Cache frequently used context (system prompts, documents) to reduce latency and cost.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant that answers questions about our company policy.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "What is our vacation policy?"}
    ]
)
Check if cache was used
print(f"Cache created: {response.model_dump().get('usage', {}).get('cache_creation_input_tokens', 0)}")
print(f"Cache read: {response.model_dump().get('usage', {}).get('cache_read_input_tokens', 0)}")

Context Compaction and Editing

For very long sessions, you can compact or edit the context to remove irrelevant information while preserving key facts.

5. Files and Assets: Working with Documents

Claude can process various file types:

PDF Support

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize this PDF."
                }
            ]
        }
    ]
)
print(response.content[0].text)

Images and Vision

Claude can analyze images for visual understanding.

with open("diagram.png", "rb") as f:
    img_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": img_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this diagram."
                }
            ]
        }
    ]
)
print(response.content[0].text)

6. Batch Processing: Cost-Effective Scale

For large volumes of requests, use batch processing. Batch API calls cost 50% less than standard API calls.

# Create a batch of messages
batch_response = client.batches.create(
    requests=[
        {
            "custom_id": "req-001",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Translate to French: Hello, world!"}]
            }
        },
        {
            "custom_id": "req-002",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Translate to Spanish: Hello, world!"}]
            }
        }
    ]
)
print(f"Batch ID: {batch_response.id}")
print(f"Batch status: {batch_response.processing_status}")

Feature Availability by Platform

Not all features are available everywhere. Here's a quick reference:

Feature	Claude API	AWS Bedrock	Vertex AI
Extended Thinking	GA	GA	GA
Structured Outputs	GA	GA	Beta
Citations	GA	GA	GA
Prompt Caching	GA	GA	GA
Batch Processing	GA	GA	GA
Computer Use	Beta	Beta	N/A
Web Search	GA	GA	GA

GA = Generally Available, Beta = In preview

Key Takeaways

Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files. Start with model capabilities and tools, then optimize with context management and batch processing.
Use Extended Thinking for complex reasoning and Structured Outputs for reliable JSON responses. Citations ground responses in source documents.
Leverage built-in tools (web search, code execution, computer use) to build powerful agents. Use parallel tool calls for efficiency.
Prompt caching reduces latency and cost for repeated context. Batch processing cuts costs by 50% for large workloads.
Check feature availability per platform before building. Some features are in beta or not available on all cloud platforms.

Ready to build? Start with the Quickstart guide and explore the API reference for complete details.