GuideBeginnerPricing2026-05-22

Claude API Features Overview: A Practical Guide to Model Capabilities, Tools, and Context Management

Explore Claude's API surface: model capabilities, tools, context management, and files. Learn how to steer reasoning, use tools, and optimize costs with practical examples.

Quick Answer

This guide walks you through Claude's five API areas: model capabilities (extended thinking, structured outputs), tools (web search, code execution), context management (prompt caching, compaction), and file handling. You'll learn how to control reasoning depth, reduce latency, and cut costs by 50% with batch processing.

Claude APIExtended ThinkingTool UseContext ManagementPrompt Caching

Introduction

Claude's API is more than a simple text-in, text-out interface. It's a modular platform designed to give you fine-grained control over how Claude reasons, what actions it takes, and how it manages long conversations. Whether you're building a customer support bot, a code assistant, or an autonomous agent, understanding the five core areas of the API surface is essential.

This guide covers:

Model capabilities – Steering reasoning depth and output format
Tools – Letting Claude interact with the web, files, and your environment
Tool infrastructure – Discovery and orchestration at scale
Context management – Keeping long-running sessions efficient
Files and assets – Handling documents, images, and PDFs

By the end, you'll know which features to use for your use case and how to combine them for maximum performance and cost efficiency.

1. Model Capabilities: Steering Claude's Reasoning and Output

Claude's model capabilities let you control how it thinks and what it produces. These are the foundational building blocks.

Extended Thinking with Adaptive Thinking

Claude can now dynamically decide when to "think" more deeply. With Adaptive Thinking (GA on Claude API, AWS, Bedrock, and Vertex AI), you set an effort parameter to control reasoning depth. This is ideal for complex math, multi-step logic, or code generation tasks.

Python example:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048,
        "effort": "high"  # Options: low, medium, high
    },
    messages=[
        {"role": "user", "content": "Prove the Pythagorean theorem using geometry."}
    ]
)
print(response.content[0].text)

Structured Outputs

For production systems, you often need Claude to return JSON or follow a strict schema. Use the structured_outputs parameter to enforce a response format.

TypeScript example:

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'List three planets and their distance from the sun.' }],
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'planets',
      schema: {
        type: 'object',
        properties: {
          planets: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                name: { type: 'string' },
                distance_km: { type: 'number' }
              },
              required: ['name', 'distance_km']
            }
          }
        },
        required: ['planets']
      }
    }
  }
});

Batch Processing for Cost Savings

If you have large volumes of non-urgent requests (e.g., data enrichment, content classification), use Batch Processing. Batch API calls cost 50% less than standard API calls. You send a file of requests and poll for results.

Note: Batch processing is not eligible for Zero Data Retention (ZDR).

2. Tools: Letting Claude Take Action

Tools extend Claude's capabilities beyond text generation. Claude can call functions, search the web, execute code, and even control a computer.

Web Search Tool

Claude can search the web in real-time to answer questions about current events, documentation, or any online content.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[{
        "type": "web_search",
        "name": "web_search"
    }],
    messages=[
        {"role": "user", "content": "What are the latest features in Claude 4?"}
    ]
)

Code Execution Tool

Claude can run Python code in a sandboxed environment, making it ideal for data analysis, prototyping, or teaching.

Computer Use Tool (Beta)

For advanced automation, Claude can control a virtual desktop environment. This is a research preview feature and requires careful safety handling.

Parallel Tool Use

Claude can call multiple tools in a single turn. For example, it can search the web, fetch a file, and run a code snippet simultaneously. This reduces latency for complex multi-step tasks.

3. Tool Infrastructure: Discovery and Orchestration

When you have many tools, you need infrastructure to manage them. Claude's API provides:

Tool Runner (SDK) – Automates the loop of calling tools and returning results.
Fine-grained Tool Streaming – Stream tool calls and results token by token for real-time UIs.
Tool Combinations – Define which tools can be used together and in what order.
Programmatic Tool Calling – Call tools directly from your code without waiting for Claude to decide.

Example: Tool Runner with Parallel Calls

from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=[
        {"type": "web_search", "name": "search"},
        {"type": "code_execution", "name": "run_code"}
    ],
    tool_choice={"type": "auto"},
    messages=[
        {"role": "user", "content": "Search for the latest GDP of Japan and calculate its growth rate compared to last year."}
    ]
)

4. Context Management: Keeping Long Sessions Efficient

Long conversations can consume large context windows and increase costs. Claude provides several tools to manage this.

Context Windows

Claude supports up to 1 million tokens of context. This is enough to process entire codebases, lengthy documents, or multi-hour conversations.

Prompt Caching

Reduce latency and cost by caching repeated system prompts or large context blocks. Cached content is reused across requests without reprocessing.

Python example with caching:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant with knowledge of the entire Python documentation.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "How do I use asyncio?"}
    ]
)

Context Compaction

For very long sessions, you can compact the conversation history into a summary, reducing token usage while preserving key information.

Token Counting

Use the token counting endpoint to estimate costs before sending a request.

tokens = client.count_tokens("Hello, world!")
print(tokens)

5. Files and Assets: Working with Documents and Images

Claude can process files directly, including PDFs, images, and text documents.

PDF Support

Upload PDFs and ask Claude to extract, summarize, or answer questions about their content.

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize the key findings of this report."
                }
            ]
        }
    ]
)

Images and Vision

Claude can analyze images, diagrams, and screenshots. This is useful for UI testing, medical imaging, or visual Q&A.

Feature Availability and Lifecycle

Not all features are available on every platform. Claude uses a classification system:

Classification	Description
Beta	Preview features for feedback. May change or be discontinued. Not for production.
GA	Stable, fully supported, recommended for production. Covered by versioning guarantees.
Deprecated	Still functional but not recommended. Migration path provided.
Retired	No longer available.

Platforms include: Claude API (Anthropic), Claude Platform on AWS, Bedrock (AWS-operated), Vertex AI (Google-operated), and Microsoft Foundry.

Putting It All Together: A Practical Workflow

Here's a realistic workflow combining multiple features:

User uploads a PDF (Files API)
Claude extracts text and uses Extended Thinking to analyze it
Claude searches the web for related current information (Web Search Tool)
Claude runs a Python script to compute statistics (Code Execution Tool)
Results are returned as structured JSON (Structured Outputs)
The conversation is cached for follow-up questions (Prompt Caching)

This workflow demonstrates how the five API areas work together seamlessly.

Key Takeaways

Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files. Master each to build sophisticated applications.
Use Extended Thinking for complex reasoning and Structured Outputs for predictable JSON responses. Batch processing cuts costs by 50%.
Tools let Claude act on the world: web search, code execution, and computer use are available. Use parallel tool calls to reduce latency.
Manage context efficiently with prompt caching, context compaction, and token counting to keep costs low and performance high.
Feature availability varies by platform – always check the classification (Beta, GA, Deprecated) before building for production.