GuideBeginnerAgents2026-05-20

Claude API Feature Overview: A Practical Guide to Model Capabilities, Tools, and Context Management

Explore the five core areas of the Claude API: model capabilities, tools, context management, files, and tool infrastructure. Learn how to use each feature with code examples and best practices.

Quick Answer

This guide breaks down the Claude API into five feature areas: model capabilities (thinking, structured outputs), tools (web search, code execution), context management (prompt caching, compaction), files (PDF, images), and tool infrastructure (discovery, orchestration). You'll learn when and how to use each feature with practical code snippets.

Claude APIExtended ThinkingTool UseContext ManagementStructured Outputs

Claude API Feature Overview: A Practical Guide

Whether you're building a simple chatbot or a complex agentic system, understanding the Claude API's feature landscape is essential. The API surface is organized into five core areas: model capabilities, tools, tool infrastructure, context management, and files/assets. This guide walks through each area with practical advice and code examples to help you choose the right features for your use case.

1. Model Capabilities: Steering Claude's Output

Model capabilities control how Claude reasons and what it returns. These are the most fundamental features you'll use.

Extended Thinking & Adaptive Thinking

For complex reasoning tasks, Claude can "think" before responding. With extended thinking, you set a fixed thinking budget. Adaptive thinking (recommended for Opus 4.7) lets Claude decide dynamically how much to think, using the effort parameter.

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048  # Extended thinking budget
    },
    messages=[
        {"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) dx"}
    ]
)
Access the thinking block
for block in response.content:
    if block.type == "thinking":
        print("Thinking:", block.thinking)
    elif block.type == "text":
        print("Answer:", block.text)

For adaptive thinking, use the effort parameter:

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "effort": "high"  # Options: low, medium, high
    },
    messages=[{"role": "user", "content": "Explain quantum entanglement in simple terms"}]
)

Structured Outputs

When you need Claude to return data in a specific format (JSON, YAML, etc.), use structured outputs. Define a schema and Claude will comply reliably.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Extract the name, date, and amount from this invoice: ..."}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "invoice_extraction",
            "schema": {
                "type": "object",
                "properties": {
                    "invoice_number": {"type": "string"},
                    "date": {"type": "string"},
                    "total_amount": {"type": "number"}
                },
                "required": ["invoice_number", "date", "total_amount"]
            }
        }
    }
)

Citations

Ground Claude's responses in source documents. When you provide reference material, Claude can cite specific passages.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user", 
        "content": "Summarize the key findings from the attached research paper."
    }],
    documents=[{
        "type": "document",
        "source": {"type": "text", "content": "... full paper text ..."},
        "citations": {"enabled": True}
    }]
)

2. Tools: Letting Claude Take Action

Tools extend Claude's capabilities beyond text generation. You can give Claude access to web search, code execution, file operations, and more.

Built-in Tools

Anthropic provides several pre-built tools:

Web Search Tool: Fetch real-time information from the web
Code Execution Tool: Run Python code in a sandboxed environment
Text Editor Tool: Read, write, and edit files
Computer Use Tool (beta): Control a virtual desktop

Example using the web search tool:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=[{
        "type": "web_search",
        "name": "web_search"
    }],
    messages=[{"role": "user", "content": "What's the latest news on AI regulation in the EU?"}]
)

Custom Tool Definition

Define your own tools with a JSON schema:

tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
]
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)
Handle tool use
if response.stop_reason == "tool_use":
    for block in response.content:
        if block.type == "tool_use":
            print(f"Tool called: {block.name}")
            print(f"Arguments: {block.input}")

3. Context Management: Keeping Sessions Efficient

Long conversations or large documents can consume significant context. Claude offers several features to manage this.

Prompt Caching

Cache frequently used context (system prompts, document chunks) to reduce latency and cost.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant with knowledge of our product documentation.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "How do I reset my password?"}]
)
Check cache metrics
print(f"Cache created: {response.usage.cache_creation_input_tokens}")
print(f"Cache read: {response.usage.cache_read_input_tokens}")

Context Compaction

For long-running sessions, compact the conversation history to stay within context limits.

# After many turns, compact the conversation
compacted = client.messages.compact(
    messages=conversation_history  # List of messages
)
Use the compacted version for the next request
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=compacted + [{"role": "user", "content": "Continue our discussion..."}]
)

4. Files and Assets: Working with Documents

Claude can process various file types directly.

PDF Support

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "base64",
                    "media_type": "application/pdf",
                    "data": pdf_data
                }
            },
            {
                "type": "text",
                "text": "Summarize this PDF."
            }
        ]
    }]
)

Image Understanding

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/jpeg",
                    "data": image_base64
                }
            },
            {"type": "text", "text": "Describe this image in detail."}
        ]
    }]
)

5. Tool Infrastructure: Orchestration at Scale

When building complex agents, you need infrastructure for tool discovery, routing, and execution.

MCP (Model Context Protocol)

MCP provides a standardized way to connect Claude to external tools and data sources. Use remote MCP servers for production deployments.

# Configure MCP connector
from anthropic import Anthropic
client = Anthropic()
Tools can be discovered dynamically via MCP
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=[{
        "type": "mcp",
        "server_url": "https://my-mcp-server.example.com/tools"
    }],
    messages=[{"role": "user", "content": "Query my database for recent orders."}]
)

Parallel Tool Use

Claude can call multiple tools simultaneously for efficiency:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=[weather_tool, news_tool, calendar_tool],
    parallel_tool_calls=True,  # Enable parallel execution
    messages=[{"role": "user", "content": "What's the weather today and do I have any meetings?"}]
)

Feature Availability by Platform

Not all features are available everywhere. Here's a quick reference:

Feature	Claude API	AWS Bedrock	Vertex AI
Extended Thinking	GA	GA	GA
Adaptive Thinking	GA	GA	GA
Structured Outputs	GA	GA	GA
Prompt Caching	GA	GA	Beta
Batch Processing	GA	GA	GA
Computer Use	Beta	Beta	-
MCP	Beta	Beta	-

GA = Generally Available, Beta = Preview with possible changes

Best Practices

Start simple: Begin with model capabilities (thinking, structured outputs) before adding tools.
Use caching for static context: Cache system prompts and reference documents to reduce costs.
Monitor token usage: Use the usage field in responses to track input/output tokens.
Handle tool calls gracefully: Always check stop_reason and handle tool_use blocks.
Test with batch processing: For high-volume workloads, use batch API to save 50% on costs.

Key Takeaways

Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files/assets.
Use extended/adaptive thinking for complex reasoning and structured outputs for reliable data extraction.
Leverage built-in tools (web search, code execution) or define custom tools with JSON schemas.
Manage context efficiently with prompt caching and compaction to keep costs low and responses fast.
Check feature availability per platform (Claude API, Bedrock, Vertex AI) before building, as some features are in beta.