GuideBeginnerAPI2026-05-16

Claude API Features Overview: A Practical Guide to Model Capabilities, Tools, and Context Management

Explore the five core areas of the Claude API: model capabilities, tools, context management, files, and tool infrastructure. Learn how to steer reasoning, use tools, and manage context efficiently.

Quick Answer

This guide breaks down the Claude API's five core feature areas: model capabilities (thinking, citations, structured outputs), tools (web fetch, code execution, computer use), context management (prompt caching, compaction), files (PDF, images), and tool infrastructure (MCP, programmatic tool calling). You'll learn how to combine these for powerful, production-ready applications.

Claude APIModel CapabilitiesToolsContext ManagementBatch Processing

Introduction

The Claude API offers a rich set of features that go far beyond simple text generation. Whether you're building a customer support agent, a code assistant, or a document analysis tool, understanding the API's five core areas will help you design more capable, cost-effective, and reliable applications.

This guide provides a practical overview of the Claude API's feature surface, organized into five areas:

Model capabilities – How to steer Claude's reasoning and output format
Tools – Letting Claude take actions on the web or in your environment
Tool infrastructure – Discovery and orchestration at scale
Context management – Keeping long-running sessions efficient
Files and assets – Managing documents and data

We'll cover what each area offers, when to use it, and include code examples to get you started.

1. Model Capabilities: Steering Claude's Output

Model capabilities control how Claude reasons and what it produces. These are the most fundamental features you'll use.

Extended Thinking (Adaptive Thinking)

Claude can now dynamically decide when and how much to "think" before responding. This is especially useful for complex reasoning tasks like math, code generation, or multi-step analysis.

When to use: Any task requiring deep reasoning, chain-of-thought, or problem-solving.

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=2048,
    thinking={
        "type": "enabled",
        "budget_tokens": 1024
    },
    messages=[{"role": "user", "content": "Solve this equation step by step: 3x + 7 = 22"}]
)
print(response.content[0].text)

Key point: Use the thinking parameter with a budget_tokens value. Claude will use up to that many tokens for internal reasoning before producing the visible response.

Structured Outputs

When you need JSON, YAML, or any structured format, use the response_format parameter. This ensures Claude's output is always valid and parseable.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    response_format={"type": "json_object"},
    messages=[{"role": "user", "content": "List three planets and their distances from the sun in AU."}]
)
import json
data = json.loads(response.content[0].text)
print(data)

When to use: Any time you need to programmatically consume Claude's output—API integrations, data pipelines, UI components.

Citations

Ground Claude's responses in source documents. With Citations, Claude can provide detailed references to the exact sentences it used to generate an answer.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    documents=[
        {
            "type": "document",
            "source": {
                "type": "text",
                "media_type": "text/plain",
                "data": "The Eiffel Tower was completed in 1889. It is 330 meters tall."
            },
            "title": "Eiffel Tower Facts",
            "citations": {"enabled": True}
        }
    ],
    messages=[{"role": "user", "content": "When was the Eiffel Tower completed?"}]
)
print(response.content[0].text)
Claude will include citations like [0] pointing to the source sentence.

When to use: Legal, medical, research, or any domain where source attribution is critical.

Batch Processing

Send large volumes of requests asynchronously and save 50% on API costs. Batch API calls are ideal for offline processing, data enrichment, or content generation at scale.

# Create a batch of messages
batch = client.batches.create(
    requests=[
        {
            "custom_id": "req-001",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 256,
                "messages": [{"role": "user", "content": "Summarize: AI is transforming..."}]
            }
        },
        {
            "custom_id": "req-002",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 256,
                "messages": [{"role": "user", "content": "Translate to French: Hello world"}]
            }
        }
    ]
)
Later, retrieve results
results = client.batches.retrieve(batch.id)

When to use: Large-scale data processing, content generation, evaluation pipelines. Not suitable for real-time applications.

2. Tools: Letting Claude Take Action

Tools extend Claude's capabilities beyond text generation. Claude can call external functions, fetch web content, execute code, and even control a computer.

Web Search Tool

Give Claude real-time access to the internet. Useful for current events, research, or fact-checking.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[{"type": "web_search"}],
    messages=[{"role": "user", "content": "What is the latest news about AI regulation in the EU?"}]
)

Code Execution Tool

Let Claude write and run Python code in a sandboxed environment. Perfect for data analysis, math, or generating charts.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=[{"type": "code_execution"}],
    messages=[{"role": "user", "content": "Calculate the Fibonacci sequence up to 20 and plot it."}]
)

Computer Use Tool

Claude can control a virtual desktop environment—clicking buttons, typing text, navigating UIs. This is a research preview feature.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[{"type": "computer_use"}],
    messages=[{"role": "user", "content": "Open the browser and go to wikipedia.org"}]
)

When to use: Automation of GUI-based workflows, testing, data entry.

Parallel Tool Use

Claude can call multiple tools simultaneously, reducing latency for independent operations.

tools = [
    {"type": "web_search"},
    {"type": "code_execution"}
]
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=tools,
    messages=[{"role": "user", "content": "Search for the current USD to EUR rate and calculate what $500 is in euros."}]
)

3. Tool Infrastructure: Orchestration at Scale

When you have many tools, you need infrastructure to manage discovery, routing, and context.

MCP (Model Context Protocol)

MCP is an open protocol for connecting Claude to external tools and data sources. It standardizes how tools are discovered and invoked.

# Using MCP connector
from anthropic import Anthropic
client = Anthropic()
MCP tools are configured server-side
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[{"type": "mcp", "server_url": "https://my-mcp-server.com"}],
    messages=[{"role": "user", "content": "Query the database for recent orders."}]
)

When to use: Enterprise environments with many internal tools, databases, or APIs.

Programmatic Tool Calling

For advanced use cases, you can bypass Claude's automatic tool selection and call tools programmatically, then feed results back to Claude.

# Step 1: Get Claude's tool call
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[weather_tool],
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)
Step 2: Execute the tool yourself
tool_call = response.content[0]
weather_data = my_weather_api(tool_call.input["location"])
Step 3: Return result to Claude
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[weather_tool],
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo?"},
        {"role": "assistant", "content": [tool_call]},
        {"role": "user", "content": [{"type": "tool_result", "content": str(weather_data)}]}
    ]
)

4. Context Management: Keeping Sessions Efficient

Long conversations or large documents can consume many tokens. Context management features help you stay within limits and reduce costs.

Prompt Caching

Cache repeated system prompts or document chunks to avoid re-processing them on every request. This reduces latency and cost.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant specialized in Python programming.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Explain list comprehensions."}]
)

When to use: Repeated system prompts, large reference documents, multi-turn conversations.

Context Compaction

Reduce token usage by summarizing or compressing older parts of a conversation while preserving key information.

# Enable compaction in the API
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    compaction="auto",
    messages=long_conversation_history
)

When to use: Very long conversations (hundreds of messages) where you need to stay within context windows.

5. Files and Assets: Working with Documents

Claude can process PDFs, images, and other file types directly.

PDF Support

Claude can read and analyze PDF documents, including text extraction and layout understanding.

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {"type": "text", "text": "Summarize this report."}
            ]
        }
    ]
)

Image and Vision

Claude can analyze images, charts, screenshots, and more.

with open("chart.png", "rb") as f:
    img_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": img_data
                    }
                },
                {"type": "text", "text": "What does this chart show?"}
            ]
        }
    ]
)

Feature Availability by Platform

Not all features are available on every platform. Here's a quick reference:

Feature	Claude API	AWS Bedrock	Vertex AI
Extended Thinking	✅ GA	✅ GA	✅ GA
Citations	✅ GA	✅ GA	✅ GA
Batch Processing	✅ GA	✅ GA	✅ GA
Web Search Tool	✅ GA	✅ GA	✅ GA
Computer Use	✅ Beta	✅ Beta	❌
Prompt Caching	✅ GA	✅ GA	✅ GA
PDF Support	✅ GA	✅ GA	✅ GA

Check the official documentation for the most up-to-date availability.

Key Takeaways

Start with model capabilities – Extended thinking, structured outputs, and citations are the most impactful features for improving response quality and reliability.
Use tools to extend Claude's reach – Web search, code execution, and computer use let Claude interact with the real world. Combine them with parallel tool calling for efficiency.
Manage context proactively – Prompt caching and context compaction are essential for cost-effective, long-running sessions. Use them from day one.
Batch processing saves 50% – For non-real-time workloads, batch API calls dramatically reduce costs without sacrificing quality.
Check feature availability per platform – Not all features are available on AWS Bedrock, Vertex AI, or Microsoft Foundry. Always verify before building.