BeClaude
GuideBeginnerAgents2026-05-22

Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management

Explore the full Claude API surface—model capabilities, tools, context management, and files. Learn how to build powerful AI applications with practical code examples.

Quick Answer

This guide walks you through the five core areas of the Claude API: model capabilities, tools, tool infrastructure, context management, and file handling. You'll learn how to use extended thinking, structured outputs, citations, tool calling, prompt caching, and batch processing with practical Python examples.

Claude APIcontext managementtool useextended thinkingbatch processing

Introduction

The Claude API offers a rich surface area for building intelligent, production-ready applications. Whether you're creating a chatbot, an agent that browses the web, or a system that processes millions of documents, understanding the five core areas of the API is essential.

This guide covers:

  • Model capabilities – reasoning, structured outputs, citations
  • Tools – letting Claude act on the web or in your environment
  • Tool infrastructure – discovery and orchestration at scale
  • Context management – keeping long-running sessions efficient
  • Files and assets – managing documents and data
By the end, you'll know which features to use and when, and you'll have practical code snippets to get started.

1. Model Capabilities: Steering Claude's Output

Claude's model capabilities let you control how it reasons and formats responses. These are the building blocks for any application.

Extended Thinking and Adaptive Thinking

For complex reasoning tasks, Claude can "think" before responding. With Extended Thinking, you set a fixed thinking budget. With Adaptive Thinking (recommended for Opus 4.7), Claude decides how much to think dynamically.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-opus-4-20250514", max_tokens=4096, thinking={ "type": "enabled", "budget_tokens": 2048 }, messages=[ {"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) dx"} ] )

Access the thinking block

for block in response.content: if block.type == "thinking": print("Thinking:", block.thinking) elif block.type == "text": print("Answer:", block.text)

Structured Outputs

Need JSON, YAML, or a specific schema? Use the structured_outputs feature to enforce response formats.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "List three planets and their moons as JSON"}
    ],
    structured_outputs={
        "type": "json_schema",
        "json_schema": {
            "name": "planets",
            "schema": {
                "type": "object",
                "properties": {
                    "planets": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "moons": {"type": "array", "items": {"type": "string"}}
                            },
                            "required": ["name", "moons"]
                        }
                    }
                },
                "required": ["planets"]
            }
        }
    }
)

print(response.content[0].text)

Citations

Ground Claude's responses in source documents. With Citations, Claude provides exact references to the source material.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Summarize the key findings from the attached report."}
    ],
    documents=[
        {
            "type": "document",
            "source": {
                "type": "text",
                "media_type": "text/plain",
                "data": "Q3 revenue grew 15% year-over-year to $2.1B. Operating margin improved to 22%."
            },
            "title": "Q3 Earnings Report",
            "context": "This is the company's quarterly earnings report.",
            "citations": {"enabled": True}
        }
    ]
)

print(response.content[0].text)

Output includes citations like [1] pointing to the source

2. Tools: Letting Claude Take Action

Tools extend Claude's capabilities beyond text generation. Claude can call functions, browse the web, execute code, and more.

Defining Tools

You define tools as JSON schemas. Claude decides when to call them.

def get_weather(location: str) -> str:
    """Get current weather for a location."""
    # In production, call a real weather API
    return f"The weather in {location} is sunny, 72°F."

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, tools=[ { "name": "get_weather", "description": "Get the current weather for a location", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "City and state, e.g., San Francisco, CA" } }, "required": ["location"] } } ], messages=[ {"role": "user", "content": "What's the weather in Tokyo?"} ] )

Check if Claude wants to use a tool

for block in response.content: if block.type == "tool_use": print(f"Calling tool: {block.name}") print(f"Arguments: {block.input}") result = get_weather(block.input["location"]) # Send result back to Claude...

Built-in Tools

Claude comes with several built-in tools:

  • Web search tool – search the internet
  • Web fetch tool – fetch content from URLs
  • Code execution tool – run Python code in a sandbox
  • Computer use tool – control a virtual desktop
  • Bash tool – run shell commands
  • Memory tool – store and retrieve information across sessions

Parallel Tool Use

Claude can call multiple tools simultaneously for efficiency.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[weather_tool, stock_tool, news_tool],
    parallel_tool_use=True,
    messages=[
        {"role": "user", "content": "What's the weather in London, the stock price of AAPL, and today's top news?"}
    ]
)

3. Tool Infrastructure: Discovery and Orchestration

When building complex agents, you need more than just tool definitions. The Claude API provides infrastructure for:

  • Tool Runner (SDK) – automatically handles tool call loops
  • Strict tool use – force Claude to use specific tools
  • Tool search – let Claude discover tools dynamically
  • Fine-grained tool streaming – stream tool calls and results
  • Tool combinations – define workflows that chain tools together

Tool Runner Example

from anthropic import Anthropic
from anthropic.types import ToolUseBlock

client = Anthropic()

Define a simple tool

weather_tool = { "name": "get_weather", "description": "Get weather for a location", "input_schema": { "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } }

Use the Tool Runner (conceptual - actual implementation may vary)

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, tools=[weather_tool], tool_choice={"type": "auto"}, messages=[ {"role": "user", "content": "What's the weather in Paris?"} ] )

The SDK can automatically handle the tool call loop

See the Tool Runner documentation for details

4. Context Management: Keeping Sessions Efficient

Long conversations or large documents require careful context management. Claude provides several features:

Context Windows

Claude supports up to 1 million tokens of context. This allows processing entire codebases, lengthy books, or hours of conversation.

Prompt Caching

Cache frequently used context (system prompts, documents) to reduce latency and cost.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant that answers questions about our company policy.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "What is our vacation policy?"}
    ]
)

Check if cache was used

print(f"Cache created: {response.model_dump().get('usage', {}).get('cache_creation_input_tokens', 0)}") print(f"Cache read: {response.model_dump().get('usage', {}).get('cache_read_input_tokens', 0)}")

Context Compaction and Editing

For very long sessions, you can compact or edit the context to remove irrelevant information while preserving key facts.

5. Files and Assets: Working with Documents

Claude can process various file types:

PDF Support

import base64

with open("report.pdf", "rb") as f: pdf_data = base64.b64encode(f.read()).decode("utf-8")

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, messages=[ { "role": "user", "content": [ { "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": pdf_data } }, { "type": "text", "text": "Summarize this PDF." } ] } ] )

print(response.content[0].text)

Images and Vision

Claude can analyze images for visual understanding.

with open("diagram.png", "rb") as f:
    img_data = base64.b64encode(f.read()).decode("utf-8")

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": img_data } }, { "type": "text", "text": "Describe this diagram." } ] } ] )

print(response.content[0].text)

6. Batch Processing: Cost-Effective Scale

For large volumes of requests, use batch processing. Batch API calls cost 50% less than standard API calls.

# Create a batch of messages
batch_response = client.batches.create(
    requests=[
        {
            "custom_id": "req-001",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Translate to French: Hello, world!"}]
            }
        },
        {
            "custom_id": "req-002",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Translate to Spanish: Hello, world!"}]
            }
        }
    ]
)

print(f"Batch ID: {batch_response.id}") print(f"Batch status: {batch_response.processing_status}")

Feature Availability by Platform

Not all features are available everywhere. Here's a quick reference:

FeatureClaude APIAWS BedrockVertex AI
Extended ThinkingGAGAGA
Structured OutputsGAGABeta
CitationsGAGAGA
Prompt CachingGAGAGA
Batch ProcessingGAGAGA
Computer UseBetaBetaN/A
Web SearchGAGAGA
GA = Generally Available, Beta = In preview

Key Takeaways

  • Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files. Start with model capabilities and tools, then optimize with context management and batch processing.
  • Use Extended Thinking for complex reasoning and Structured Outputs for reliable JSON responses. Citations ground responses in source documents.
  • Leverage built-in tools (web search, code execution, computer use) to build powerful agents. Use parallel tool calls for efficiency.
  • Prompt caching reduces latency and cost for repeated context. Batch processing cuts costs by 50% for large workloads.
  • Check feature availability per platform before building. Some features are in beta or not available on all cloud platforms.
Ready to build? Start with the Quickstart guide and explore the API reference for complete details.