GuideBeginnerBest Practices2026-05-21

Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management

Explore Claude's API surface: model capabilities, tools, context management, and file handling. Practical code examples and best practices for building with Claude.

Quick Answer

This guide walks you through Claude's five API areas—model capabilities, tools, tool infrastructure, context management, and file handling—with practical code examples and best practices for building production-ready applications.

Claude APIToolsContext ManagementModel CapabilitiesBest Practices

Introduction

Claude’s API is more than just a text-in, text-out interface. It’s a rich ecosystem designed to give you fine-grained control over how Claude reasons, interacts with external systems, and manages long-running conversations. Whether you’re building a simple chatbot, a complex agent, or an enterprise-grade application, understanding the five core areas of the API surface is essential.

This guide covers:

Model capabilities – controlling reasoning depth and output format
Tools – letting Claude act on the web or in your environment
Tool infrastructure – discovery and orchestration at scale
Context management – keeping long sessions efficient
Files and assets – managing documents and data

We’ll also touch on feature availability classifications (Beta, GA, Deprecated, Retired) and provide practical code examples so you can start building immediately.

1. Model Capabilities: Steering Claude’s Reasoning and Output

Claude offers several ways to control how it processes input and generates responses. The most important capabilities include:

Extended Thinking & Adaptive Thinking

Claude can “think” before responding, which improves reasoning on complex tasks. With Adaptive Thinking (recommended for Opus 4.7), Claude decides dynamically how much to think. You can also set a fixed effort parameter to control depth.

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=1024,
    thinking={"type": "enabled", "budget_tokens": 2048},
    messages=[
        {"role": "user", "content": "Solve this complex math problem step by step: integrate x^2 * sin(x) dx"}
    ]
)
print(response.content[0].text)

Structured Outputs

Use the stop_reason and stop_sequence parameters to control when Claude stops generating. For structured data, combine with tools or system prompts.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=500,
    system="You are a data extraction assistant. Always output JSON.",
    messages=[
        {"role": "user", "content": "Extract the name, date, and amount from this invoice: ..."}
    ]
)

Streaming & Batch Processing

Streaming: Receive tokens as they’re generated for real-time UX.
Batch Processing: Send large volumes of requests asynchronously at 50% lower cost.

# Streaming example
with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a short poem about AI."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

2. Tools: Letting Claude Take Action

Claude can use tools to interact with external systems. Tools are defined as JSON schemas and can be called in parallel or sequentially.

Defining a Tool

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
]
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

Handling Tool Calls

When Claude decides to use a tool, the response contains a tool_use content block. You must execute the tool and return the result.

import json
if response.stop_reason == "tool_use":
    for content in response.content:
        if content.type == "tool_use":
            tool_name = content.name
            tool_input = content.input
            # Execute your function
            result = get_weather(tool_input["location"])
            # Send result back
            tool_result = client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1024,
                tools=tools,
                messages=[
                    {"role": "user", "content": "What's the weather in Tokyo?"},
                    {"role": "assistant", "content": response.content},
                    {"role": "user", "content": [{"type": "tool_result", "tool_use_id": content.id, "content": str(result)}]}
                ]
            )

Built-in Tools

Claude provides several server-side tools you can enable:

Web search tool – fetch live information
Code execution tool – run Python/JavaScript in a sandbox
Computer use tool – control a virtual desktop (beta)
Memory tool – persist information across conversations
Bash tool – execute shell commands

3. Tool Infrastructure: Discovery and Orchestration

For production systems with many tools, you need infrastructure to manage discovery, context, and scaling.

Tool Runner (SDK)

The Anthropic SDK includes a ToolRunner that handles tool call execution and result injection automatically.

from anthropic import Anthropic
from anthropic.tools import ToolRunner
client = Anthropic()
runner = ToolRunner(client, model="claude-sonnet-4-20250514", tools=tools)
response = runner.run("What's the weather in Tokyo?")
ToolRunner automatically executes tool calls and returns final response

Prompt Caching with Tools

Cache frequently used tool definitions to reduce latency and cost. Use the cache_control parameter on tool definitions.

tools = [
    {
        "name": "search_database",
        "description": "Search internal database",
        "input_schema": {...},
        "cache_control": {"type": "ephemeral"}
    }
]

Tool Combinations & Search

You can combine multiple tools in a single request. Claude will decide which tool to use based on the user’s intent. Use tool search to let Claude dynamically discover tools from a registry.

4. Context Management: Keeping Long Sessions Efficient

Claude supports up to 1M token context windows (on supported models). Managing this context efficiently is critical for cost and performance.

Context Windows

Standard: 200K tokens
Extended: Up to 1M tokens (available on select models)

Compaction

When a conversation grows too long, use context compaction to summarize or prune older messages while preserving key information.

# Manual compaction: send a summary
summary = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=500,
    messages=[
        {"role": "user", "content": "Summarize this conversation so far, keeping all important facts and decisions."}
    ]
)
Then start a new conversation with the summary as system prompt

Prompt Caching

Cache system prompts, tool definitions, and large context blocks to reduce latency and cost. Cache hits can reduce input token costs by up to 90%.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {"type": "text", "text": "You are a helpful assistant.", "cache_control": {"type": "ephemeral"}}
    ],
    messages=[...]
)

Token Counting

Always count tokens before sending large requests to avoid hitting limits.

token_count = client.messages.count_tokens(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello, world!"}]
)
print(token_count.input_tokens)  # e.g., 11

5. Files and Assets: Managing Documents and Data

Claude can process files directly via the Files API or by embedding content in messages.

PDF Support

Claude can read and reason over PDF documents. Upload the PDF as base64 or use the Files API.

import base64
with open("document.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "document", "source": {"type": "base64", "media_type": "application/pdf", "data": pdf_data}},
                {"type": "text", "text": "Summarize this document."}
            ]
        }
    ]
)

Images and Vision

Claude can analyze images. Pass them as base64-encoded data.

with open("chart.png", "rb") as f:
    img_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": img_data}},
                {"type": "text", "text": "What does this chart show?"}
            ]
        }
    ]
)

Feature Availability & Lifecycle

Not all features are available on every platform. Claude’s API uses these classifications:

Classification	Description
Beta	Preview features, may change or be discontinued. Not for production.
GA	Stable, fully supported, recommended for production.
Deprecated	Still functional but not recommended. Migration timeline provided.
Retired	No longer available.

Platforms include: Claude API (Anthropic first-party), Claude Platform on AWS, Bedrock (AWS-operated), Vertex AI (Google-operated), and Microsoft Foundry (Anthropic-operated on Azure).

Best Practices Summary

Start with model capabilities and tools – these are the building blocks.
Use streaming for real-time UX, batch processing for cost savings.
Cache prompts and tool definitions to reduce latency and cost.
Monitor token usage with the Count Tokens endpoint.
Handle tool calls explicitly – always check stop_reason and return results.
Use context compaction for long-running conversations.
Check feature availability on your target platform before building.

Key Takeaways

Claude’s API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files/assets.
Use Adaptive Thinking and Structured Outputs to control reasoning depth and response format.
Tools let Claude interact with external systems; use the ToolRunner SDK for automatic orchestration.
Prompt caching and context compaction are essential for cost-effective long sessions.
Always check feature availability (Beta vs. GA) and platform support before building production applications.