GuideBeginnerPricing2026-05-22

Claude API Features Overview: A Practical Guide to Model Capabilities, Tools, and Context Management

Explore Claude's API surface: model capabilities, tools, context management, and files. Learn how to steer reasoning, use tools, and optimize cost with practical code examples.

Quick Answer

This guide walks you through Claude's five API areas: model capabilities (thinking, structured outputs), tools (web fetch, code execution), context management (prompt caching, compaction), and files (PDF, images). You'll learn when to use each and see practical Python code examples.

Claude APIModel CapabilitiesToolsContext ManagementBatch Processing

Introduction

Claude's API is not just a single endpoint — it's a rich ecosystem of features organized into five core areas. Whether you're building a simple chatbot, a complex agent, or a document analysis pipeline, understanding these areas helps you choose the right tools for the job.

This guide covers the Claude API surface as documented in the official overview, with practical advice and code examples for each area. By the end, you'll know how to steer Claude's reasoning, let it take actions, manage long conversations, and handle files efficiently.

The Five API Areas

The Claude API surface is organized into:

Model capabilities — Control how Claude reasons and formats responses.
Tools — Let Claude take actions on the web or in your environment.
Tool infrastructure — Handle discovery and orchestration at scale.
Context management — Keep long-running sessions efficient.
Files and assets — Manage documents and data you provide to Claude.

If you're new, start with model capabilities and tools. Return to the other sections when you're ready to optimize cost, latency, or scale.

1. Model Capabilities

Model capabilities are the direct outputs and reasoning controls you have over Claude. These include:

Extended Thinking & Adaptive Thinking

Claude can "think" before responding, which improves reasoning on complex tasks. With adaptive thinking, Claude dynamically decides when and how much to think — ideal for Opus 4.7. Use the effort parameter to control thinking depth.

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={"type": "enabled", "budget_tokens": 2048},
    messages=[
        {"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) dx"}
    ]
)
print(response.content[0].text)

Structured Outputs

Claude can return structured data (JSON) directly, making it easy to integrate with applications.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "List three famous physicists and their key contributions as JSON"}
    ],
    response_format={"type": "json_object"}
)
import json
data = json.loads(response.content[0].text)
print(data)

Batch Processing

For large volumes of requests, use batch processing to save 50% on costs. Batches are processed asynchronously.

import anthropic
client = anthropic.Anthropic()
batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "req-1",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "What is the capital of France?"}]
            }
        },
        {
            "custom_id": "req-2",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "What is the capital of Japan?"}]
            }
        }
    ]
)
print(f"Batch ID: {batch.id}")

Citations

Claude can ground responses in source documents by providing detailed references. This is critical for legal, medical, or research applications where accuracy and provenance matter.

2. Tools

Tools let Claude interact with the outside world. The API supports several built-in tools:

Web fetch tool — Retrieve web pages
Code execution tool — Run Python code in a sandbox
Text editor tool — Edit files programmatically
Computer use tool — Control a virtual desktop
Bash tool — Execute shell commands

Example: Using the Web Fetch Tool

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=[
        {
            "type": "web_fetch",
            "name": "web_fetch",
            "description": "Fetch a web page"
        }
    ],
    messages=[
        {"role": "user", "content": "What is the latest news on AI from the Anthropic blog?"}
    ]
)
Claude will decide to call the web_fetch tool
print(response.content[0].text)

Parallel Tool Use

Claude can call multiple tools in parallel, speeding up complex workflows.

Strict Tool Use

You can enforce that Claude uses a specific tool, reducing hallucination in tool-driven applications.

3. Tool Infrastructure

When building at scale, you need tool discovery and orchestration. The API provides:

Tool Runner (SDK) — Automates tool execution
Server tools — Host tools remotely
Fine-grained tool streaming — Stream tool calls incrementally
Programmatic tool calling — Call tools without Claude deciding

Example: Programmatic Tool Calling

# Force Claude to use a specific tool without letting it choose
tool_call = {
    "type": "tool_use",
    "name": "web_fetch",
    "input": {"url": "https://docs.anthropic.com"}
}
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Summarize the Anthropic documentation"},
        {"role": "assistant", "content": [tool_call]}
    ]
)

4. Context Management

Long conversations consume tokens. Claude offers several features to manage context efficiently:

Context windows — Up to 1M tokens for large documents
Compaction — Reduce token usage without losing key information
Prompt caching — Cache repeated system prompts or large documents to reduce latency and cost
Token counting — Estimate token usage before sending

Example: Prompt Caching

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant with expertise in Python programming.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Write a Python function to sort a list of dictionaries by a key"}
    ]
)

5. Files and Assets

Claude can process various file types:

PDF support — Extract text and layout from PDFs
Images and vision — Analyze images (photos, diagrams, screenshots)
Files API — Upload and reference files in conversations

Example: Processing a PDF

import base64
with open("document.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize this document in bullet points."
                }
            ]
        }
    ]
)
print(response.content[0].text)

Feature Availability Across Platforms

Not all features are available on every platform. The API docs classify features as:

Beta — Preview, may change significantly
Generally Available (GA) — Stable, production-ready
Deprecated — Still functional, migration path provided
Retired — No longer available

Key platforms include:

Claude API (Anthropic first-party)
Claude Platform on AWS (Anthropic-operated)
Amazon Bedrock (AWS-operated)
Vertex AI (Google-operated)
Microsoft Foundry (Anthropic-operated on Azure)

Check the availability column in the official docs before building for a specific platform.

Best Practices

Start simple — Begin with model capabilities and tools before adding context management or complex infrastructure.
Use batch processing for high volume — Save 50% on costs by batching asynchronous requests.
Leverage prompt caching — Cache system prompts and large documents to reduce latency and token usage.
Monitor token usage — Use the token counting endpoint to estimate costs before sending requests.
Choose the right thinking mode — Use adaptive thinking for Opus 4.7, and fixed budgets for predictable reasoning depth.

Key Takeaways

Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files.
Model capabilities include extended thinking, structured outputs, batch processing (50% cheaper), and citations.
Tools enable Claude to fetch web pages, execute code, edit files, and control a virtual desktop.
Context management features like prompt caching and compaction help keep long-running sessions efficient and cost-effective.
Feature availability varies by platform (Claude API, Bedrock, Vertex AI, etc.), so always check the GA/Beta status before building.