GuideBeginnerAPI2026-05-12

Mastering Claude's API: A Complete Guide to Features, Tools, and Context Management

Explore Claude's API surface: model capabilities, tools, context management, and file handling. Learn practical implementation with code examples for building powerful AI applications.

Quick Answer

This guide walks you through Claude's five API areas—model capabilities, tools, tool infrastructure, context management, and file handling—with actionable code examples to build, optimize, and scale your AI applications.

Claude APItoolscontext managementmodel capabilitiesbatch processing

Mastering Claude's API: A Complete Guide to Features, Tools, and Context Management

Claude's API is more than just a text generation endpoint. It's a comprehensive platform designed to give developers fine-grained control over how Claude reasons, interacts with external systems, manages long conversations, and processes files. Whether you're building a simple chatbot or a complex agentic system, understanding the full API surface is key to unlocking Claude's potential.

This guide covers the five core areas of the Claude API: model capabilities, tools, tool infrastructure, context management, and files/assets. You'll learn what each area offers, when to use it, and how to implement it with practical code examples.

---

Understanding the Five API Areas

Claude's API surface is organized into five logical areas. Each addresses a different aspect of building with AI:

Area	Purpose
Model Capabilities	Control how Claude reasons, formats responses, and processes inputs
Tools	Let Claude take actions on the web or in your environment
Tool Infrastructure	Handle discovery and orchestration at scale
Context Management	Keep long-running sessions efficient
Files and Assets	Manage documents and data you provide to Claude

If you're new to the API, start with model capabilities and tools. Once you're ready to optimize cost, latency, or scale, dive into the other sections.

---

1. Model Capabilities: Steering Claude's Output

Model capabilities are the foundational controls for how Claude behaves. They include reasoning depth, response format, and input modalities.

Extended Thinking with Adaptive Thinking

Claude can reason step-by-step before responding. With Adaptive Thinking, Claude dynamically decides when and how much to think. This is the recommended mode for Claude Opus 4.7. You can also control thinking depth using the effort parameter.

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=1024,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048,
        "effort": "high"  # Controls thinking depth
    },
    messages=[
        {"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) dx"}
    ]
)
print(response.content[0].text)

Structured Outputs

For production systems, you often need structured data. Use the structured_outputs capability to enforce JSON schemas.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Extract the name, date, and amount from this invoice: ..."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "invoice_extraction",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "date": {"type": "string"},
                    "amount": {"type": "number"}
                },
                "required": ["name", "date", "amount"]
            }
        }
    }
)
print(response.content[0].text)

Batch Processing for Cost Savings

If you have large volumes of non-real-time requests, use Batch Processing. Batch API calls cost 50% less than standard API calls.

# Create a batch of messages
batch = client.batches.create(
    requests=[
        {
            "custom_id": "req-001",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 256,
                "messages": [{"role": "user", "content": "Summarize this article: ..."}]
            }
        },
        {
            "custom_id": "req-002",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 256,
                "messages": [{"role": "user", "content": "Translate this to French: ..."}]
            }
        }
    ]
)
Later, retrieve results
results = client.batches.retrieve(batch.id)
for result in results.results:
    print(result.custom_id, result.response.content[0].text)

Citations for Grounded Responses

When Claude needs to reference source documents, use Citations. Claude will provide detailed references to exact sentences in your source material.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Based on the attached PDF, what is the main finding?"}
    ],
    documents=[
        {
            "type": "document",
            "source": {
                "type": "base64",
                "media_type": "application/pdf",
                "data": "<base64_encoded_pdf>"
            },
            "citations": {"enabled": True}
        }
    ]
)
print(response.content[0].text)

---

2. Tools: Let Claude Take Action

Tools extend Claude's capabilities beyond text generation. Claude can call external functions, search the web, execute code, and even control a computer.

How Tool Use Works

You define tools with a name, description, and input schema. Claude decides when to call them based on the conversation context.

def get_weather(location: str) -> str:
    """Get current weather for a location."""
    # Your weather API logic here
    return f"Sunny, 72°F in {location}"
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo?"}
    ]
)
Check if Claude wants to use a tool
if response.stop_reason == "tool_use":
    tool_call = response.content[-1]
    if tool_call.name == "get_weather":
        result = get_weather(tool_call.input["location"])
        # Send result back to Claude
        final_response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[
                {"role": "user", "content": "What's the weather in Tokyo?"},
                {"role": "assistant", "content": response.content},
                {"role": "user", "content": [
                    {"type": "tool_result", "tool_use_id": tool_call.id, "content": result}
                ]}
            ]
        )
        print(final_response.content[0].text)

Built-in Tools

Claude provides several server-side tools you can enable without writing custom code:

Web Search Tool: Let Claude search the internet
Code Execution Tool: Run Python code in a sandbox
Computer Use Tool: Claude can control a virtual desktop
Memory Tool: Persist information across conversations
Bash Tool: Execute shell commands

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {"type": "web_search", "name": "web_search"},
        {"type": "code_execution", "name": "execute_code"}
    ],
    messages=[
        {"role": "user", "content": "Search for the latest AI news and summarize it"}
    ]
)

Parallel Tool Use

Claude can call multiple tools simultaneously for efficiency.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[weather_tool, stock_tool, news_tool],
    parallel_tool_calls=True,  # Enable parallel calls
    messages=[
        {"role": "user", "content": "Get the weather in London, Apple's stock price, and today's top tech news"}
    ]
)

---

3. Tool Infrastructure: Orchestration at Scale

When you have many tools, you need infrastructure for discovery and orchestration. Claude's API provides:

Tool Runner (SDK): Automates the tool-use loop
Strict Tool Use: Force Claude to use specific tools
Tool Search: Let Claude find the right tool from a large catalog
Fine-grained Tool Streaming: Stream tool calls token by token

---

4. Context Management: Keeping Sessions Efficient

Long conversations consume tokens. Claude offers several features to manage context efficiently.

Context Windows

Claude supports up to 1 million tokens of context—enough to process entire codebases or lengthy documents.

Prompt Caching

Cache repeated system prompts or document chunks to reduce latency and cost.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

Context Compaction

Reduce token usage by summarizing or pruning older conversation turns.

Token Counting

Estimate token usage before making API calls.

token_count = client.messages.count_tokens(
    model="claude-sonnet-4-20250514",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)
print(f"Token count: {token_count}")

---

5. Files and Assets: Working with Documents

Claude can process various file types, including PDFs, images, and code files.

PDF Support

Upload PDFs and ask Claude to extract information, summarize, or answer questions.

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize this document in 3 bullet points."
                }
            ]
        }
    ]
)
print(response.content[0].text)

Images and Vision

Claude can analyze images for tasks like object detection, OCR, and visual question answering.

with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {"type": "text", "text": "What does this chart show?"}
            ]
        }
    ]
)

---

Feature Availability Across Platforms

Not all features are available on every platform. Here's a quick reference:

Feature	Claude API	AWS Bedrock	Vertex AI	Microsoft Foundry
Context Windows (1M tokens)	GA	GA	GA	Beta
Adaptive Thinking	GA	GA	GA	Beta
Batch Processing	GA	GA	GA	GA
Citations	GA	GA	GA	Beta
Prompt Caching	GA	GA	GA	Beta
Web Search Tool	Beta	Beta	Beta	Beta
Computer Use Tool	Beta	Beta	N/A	N/A

Features marked GA (Generally Available) are stable and production-ready. Beta features may change and are not guaranteed for production use.

---

Best Practices for Building with Claude

Start simple: Begin with model capabilities and one or two tools. Add complexity gradually.
Use structured outputs for production systems to ensure parseable responses.
Leverage batch processing for non-real-time workloads to save 50% on costs.
Cache prompts that are reused across many conversations.
Monitor token usage with the token counting endpoint to avoid surprises.
Handle tool calls properly: Always check stop_reason and respond to tool calls before asking for the final answer.

---

Key Takeaways

Claude's API has five core areas: model capabilities, tools, tool infrastructure, context management, and files/assets. Each serves a distinct purpose in building AI applications.
Use Adaptive Thinking and Structured Outputs to control reasoning depth and response format for reliable, production-ready outputs.
Batch processing cuts costs by 50%—ideal for large-scale, non-real-time workloads like data extraction or content summarization.
Built-in tools (web search, code execution, computer use) let Claude take real-world actions without custom integration.
Context management features like prompt caching and token counting help optimize both cost and performance in long-running sessions.