Guide2026-04-30

Mastering Claude's API: A Practical Guide to Model Capabilities, Tools, and Context Management

Learn to navigate Claude's API surface across five key areas: model capabilities, tools, tool infrastructure, context management, and file handling. Includes code examples.

Quick Answer

This guide walks you through Claude's API surface—model capabilities, tools, context management, and file handling—with practical code examples and best practices for building production-ready applications.

Claude APItool usecontext managementextended thinkingprompt caching

Introduction

Claude's API is more than just a text generation endpoint. It's a comprehensive platform designed to give you fine-grained control over how Claude reasons, interacts with external systems, and manages long-running conversations. Whether you're building a simple chatbot or a complex agent that browses the web and executes code, understanding the five core areas of the API surface is essential.

This guide breaks down each area—model capabilities, tools, tool infrastructure, context management, and file handling—with practical code examples and best practices. By the end, you'll know exactly which features to use for your use case and how to combine them effectively.

The Five Pillars of Claude's API

Claude's API surface is organized into five areas:

Model capabilities – Control how Claude reasons and formats responses.
Tools – Let Claude take actions on the web or in your environment.
Tool infrastructure – Handles discovery and orchestration at scale.
Context management – Keeps long-running sessions efficient.
Files and assets – Manage the documents and data you provide to Claude.

If you're new, start with model capabilities and tools. Return to the other sections when you're ready to optimize cost, latency, or scale.

Model Capabilities: Steering Claude's Reasoning and Output

Model capabilities are the foundational layer. They let you control how Claude thinks, how much it thinks, and how it formats its responses.

Extended Thinking and Adaptive Thinking

Extended Thinking lets Claude reason step-by-step before producing a final answer. This is critical for complex math, code generation, or multi-step analysis. With Adaptive Thinking (recommended for Opus 4.7), Claude dynamically decides when and how much to think. You control the depth using the effort parameter.

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048,
        "effort": "high"  # Options: low, medium, high
    },
    messages=[
        {"role": "user", "content": "Solve this: A train leaves Station A at 60 mph. Another train leaves Station B at 80 mph. They are 300 miles apart. When do they meet?"}
    ]
)
print(response.content[0].text)

Best practice: Use effort to balance reasoning depth against latency. For simple tasks, use "low"; for complex reasoning, use "high".

Structured Outputs

Structured outputs ensure Claude's responses follow a specific schema—ideal for extracting data, generating JSON, or populating templates.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Extract the name, date, and total from this invoice: Invoice #1234, John Doe, 2025-03-15, $450.00"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "invoice",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "date": {"type": "string"},
                    "total": {"type": "number"}
                },
                "required": ["name", "date", "total"]
            }
        }
    }
)
print(response.content[0].text)

Citations for Grounded Responses

Citations let Claude reference exact passages from source documents. This is invaluable for legal, medical, or research applications where verifiability is paramount.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What are the key findings from the Q4 report?"}
    ],
    documents=[
        {
            "type": "text",
            "title": "Q4 Earnings Report",
            "content": "Revenue grew 12% year-over-year...",
            "citations": {"enabled": True}
        }
    ]
)
print(response.content[0].text)

Tools: Letting Claude Take Action

Tools extend Claude's capabilities beyond text generation. Claude can call functions, fetch web pages, execute code, and even control a computer.

Defining a Custom Tool

You define tools using a JSON schema. Claude decides when to call them based on the conversation context.

def get_weather(location: str) -> str:
    # Simulated weather lookup
    return f"The weather in {location} is sunny, 72°F."
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and state, e.g., San Francisco, CA"
                }
            },
            "required": ["location"]
        }
    }
]
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What's the weather like in Austin, TX?"}
    ]
)
Check if Claude wants to use a tool
if response.stop_reason == "tool_use":
    tool_call = response.content[-1]
    if tool_call.name == "get_weather":
        result = get_weather(tool_call.input["location"])
        print(result)

Parallel Tool Use

Claude can call multiple tools in a single response, which is great for tasks that require independent lookups.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[weather_tool, stock_tool, news_tool],
    parallel_tool_calls=True,
    messages=[
        {"role": "user", "content": "Get the weather in Tokyo, the current price of Apple stock, and today's top tech news."}
    ]
)

Built-in Tools

Claude provides several built-in tools you can enable with a single flag:

Web search tool – Fetch real-time information from the web.
Code execution tool – Run Python code in a sandboxed environment.
Computer use tool – Let Claude control a virtual desktop (beta).
Memory tool – Persist information across conversations.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=[{"type": "web_search"}],
    messages=[
        {"role": "user", "content": "What are the latest AI research papers from 2025?"}
    ]
)

Tool Infrastructure: Discovery and Orchestration at Scale

When you have dozens or hundreds of tools, you need a way to manage them. Claude's tool infrastructure includes:

Tool Runner (SDK) – Automates the tool call loop (invoke tool, return result, continue).
Strict tool use – Forces Claude to use a specific tool when needed.
Tool search – Dynamically discover relevant tools based on the user's query.
Fine-grained tool streaming – Stream tool calls and results incrementally.

Example: Tool Runner with the SDK

from anthropic import Anthropic
client = Anthropic()
The SDK's Tool Runner handles the loop automatically
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[weather_tool, calculator_tool, database_tool],
    tool_choice={"type": "auto"},
    messages=[
        {"role": "user", "content": "What's the average temperature in cities where our top 3 customers are located?"}
    ]
)

Context Management: Keeping Long Conversations Efficient

Long-running sessions can consume large context windows. Claude provides several features to manage this:

Context Windows

Claude supports up to 1 million tokens of context—enough to process entire codebases or lengthy documents. However, larger contexts increase latency and cost.

Prompt Caching

Prompt caching stores frequently used context (system prompts, few-shot examples, document chunks) so you don't have to resend them. This reduces latency by up to 90% and costs by 50%.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a legal assistant specializing in contract law.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Review this non-disclosure agreement..."}
    ]
)

Context Editing and Compaction

Context editing – Remove or modify parts of the conversation history without restarting.
Compaction – Summarize older messages to free up tokens while preserving key information.

Files and Assets: Managing Documents and Data

Claude can process various file types, including PDFs, images, and code files.

PDF Support

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize this report and highlight key financial metrics."
                }
            ]
        }
    ]
)

Image and Vision

Claude can analyze images for tasks like object detection, chart reading, and document scanning.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "What's the chart showing?"
                }
            ]
        }
    ]
)

Putting It All Together: A Practical Workflow

Here's a real-world example combining multiple features: a customer support agent that reads a PDF, searches the web, and responds with citations.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system=[
        {
            "type": "text",
            "text": "You are a helpful support agent. Always cite your sources.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    tools=[
        {"type": "web_search"},
        {
            "name": "get_order_status",
            "description": "Get the status of a customer order",
            "input_schema": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string"}
                },
                "required": ["order_id"]
            }
        }
    ],
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "My order #12345 is delayed. Can you check the status and find the latest shipping policy?"
                }
            ]
        }
    ]
)

Feature Availability and Lifecycle

Not all features are available everywhere. Claude uses a classification system:

Classification	Description
Beta	Preview features for feedback. May have limited availability. Not for production.
Generally Available (GA)	Stable, fully supported, recommended for production.
Deprecated	Still functional but not recommended. Migration path provided.
Retired	No longer available.

Always check the feature's page for the latest availability on your platform (API, Amazon Bedrock, Google Vertex AI, or Microsoft Foundry).

Key Takeaways

Start with model capabilities and tools – They cover 80% of common use cases. Add context management and file handling as your application grows.
Use Adaptive Thinking for complex reasoning – Let Claude decide how much to think using the effort parameter. Start with "medium" and adjust based on results.
Leverage prompt caching for production apps – Cache system prompts and few-shot examples to reduce latency and cost significantly.
Combine tools for powerful agents – Use parallel tool calls and built-in tools (web search, code execution) to build autonomous agents.
Always check feature availability – Features in beta may change. Use GA features for production workloads.