GuideBeginnerAgents2026-05-13

Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management

Explore the full Claude API surface: model capabilities, tool use, context management, and file handling. Learn practical implementation with code examples and best practices.

Quick Answer

This guide walks you through the five core areas of the Claude API: model capabilities (thinking, structured outputs), tools (web search, code execution), context management (prompt caching, compaction), and file handling. You'll learn how to combine these features for production-ready applications.

Claude APItool usecontext managementstructured outputsprompt caching

Introduction

Claude's API is more than just a text-in, text-out interface. It's a rich ecosystem of features organized into five core areas: model capabilities, tools, tool infrastructure, context management, and files/assets. Whether you're building a simple chatbot or a complex agentic system, understanding how these pieces fit together is essential.

This guide provides a practical, feature-by-feature walkthrough of the Claude API surface. You'll learn what each area offers, how to use it in code, and when to apply it for maximum impact.

1. Model Capabilities: Steering Claude's Outputs

Model capabilities control how Claude reasons and what it produces. These are the foundational levers you'll pull most often.

Extended Thinking & Adaptive Thinking

Claude can "think" before responding, improving reasoning on complex tasks. With Adaptive Thinking (GA on Claude API and AWS), you let Claude decide when and how much to think. Use the effort parameter to control depth:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={"type": "enabled", "budget_tokens": 2048},
    messages=[
        {"role": "user", "content": "Analyze the pros and cons of quantum computing for cryptography."}
    ]
)
print(response.content[0].text)

When to use it: Complex analysis, multi-step reasoning, code generation, and any task where Claude might benefit from "thinking out loud."

Structured Outputs

Need JSON, not prose? Use structured outputs to enforce a schema:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "List three famous scientists and their discoveries."}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "scientists",
            "schema": {
                "type": "object",
                "properties": {
                    "scientists": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "discovery": {"type": "string"},
                                "year": {"type": "integer"}
                            },
                            "required": ["name", "discovery", "year"]
                        }
                    }
                },
                "required": ["scientists"]
            }
        }
    }
)
print(response.content[0].text)

When to use it: API integrations, data extraction, form filling, and any downstream system that expects structured data.

Citations

Ground Claude's responses in source documents. Claude will reference exact sentences from your provided text:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What does the document say about data retention policies?"}
    ],
    documents=[
        {
            "type": "document",
            "source": {"type": "text", "content": "Our data retention policy states that user logs are kept for 90 days..."},
            "title": "Company Policy",
            "citations": {"enabled": True}
        }
    ]
)
print(response.content[0].text)

When to use it: Legal analysis, research summaries, customer support with knowledge bases, and any application requiring verifiable answers.

2. Tools: Let Claude Take Action

Tools extend Claude beyond text generation. They let Claude interact with the world—fetch web pages, run code, use your APIs, and more.

Built-in Tools

Claude offers several first-party tools:

Web Search Tool – Fetch real-time information from the web.
Code Execution Tool – Run Python, JavaScript, or bash in a sandbox.
Web Fetch Tool – Retrieve content from a specific URL.
Memory Tool – Persist information across conversations.
Computer Use Tool – Control a virtual desktop environment (beta).

Example: Using the web search tool to answer a question:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "type": "web_search",
            "name": "web_search",
            "description": "Search the web for current information."
        }
    ],
    messages=[
        {"role": "user", "content": "What is the latest news about AI regulation in the EU?"}
    ]
)
print(response.content[0].text)

Custom Tools (Function Calling)

Define your own tools to let Claude interact with your backend:

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
]
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo?"}
    ]
)
Handle the tool call
if response.stop_reason == "tool_use":
    tool_use = response.content[-1]
    # Execute your function and return result
    print(f"Claude wants to call: {tool_use.name}")
    print(f"With arguments: {tool_use.input}")

Pro tip: Use parallel tool use to let Claude call multiple tools at once, and strict tool use to force Claude to always use a specific tool.

3. Tool Infrastructure: Orchestration at Scale

When you have many tools, you need infrastructure to manage discovery, routing, and context.

Tool Runner (SDK)

The Tool Runner SDK handles the orchestration loop automatically:

from anthropic import Anthropic
from anthropic.tools import ToolRunner
client = Anthropic()
runner = ToolRunner(client)
Define tools (built-in or custom)
runner.add_tool("web_search")
runner.add_tool("code_execution")
response = runner.run(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Search for the latest Python version and write a script to test it."}]
)
print(response.content[0].text)

MCP (Model Context Protocol)

MCP connects Claude to external servers for tool discovery. You can set up remote MCP servers that Claude queries dynamically:

# Configure MCP connector
client = Anthropic(
    mcp_servers=[
        {
            "url": "https://your-mcp-server.com",
            "headers": {"Authorization": "Bearer your-token"}
        }
    ]
)

When to use it: Large-scale agent systems, enterprise tool ecosystems, and scenarios where tools change frequently.

4. Context Management: Keeping Conversations Efficient

Long conversations consume tokens and slow down responses. Claude provides several mechanisms to manage context.

Context Windows

Claude supports up to 1M tokens of context (GA on most platforms). That's enough for entire codebases or lengthy documents.

Prompt Caching

Cache frequently used context (system prompts, knowledge bases) to reduce latency and cost:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant with expertise in Python programming.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Explain decorators in Python."}
    ]
)
Check if cache was used
print(f"Cache hit: {response.usage.cache_read_input_tokens > 0}")

Context Compaction & Editing

For long-running sessions, you can compact or edit the conversation history to stay within context limits:

# Compaction example (conceptual)
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Summarize our conversation so far."}
    ],
    # The API automatically compacts when nearing limits
)

When to use it: Customer support chatbots, code review assistants, and any application with long user sessions.

5. Files and Assets: Working with Documents

Claude can process PDFs, images, and other file types directly.

PDF Support

Upload and analyze PDFs:

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize the key findings from this report."
                }
            ]
        }
    ]
)
print(response.content[0].text)

Images and Vision

Claude can analyze images:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": base64_image_string
                    }
                },
                {
                    "type": "text",
                    "text": "Describe what you see in this image."
                }
            ]
        }
    ]
)

Putting It All Together: A Practical Example

Here's a complete example combining multiple features—thinking, structured output, tool use, and prompt caching—to build a research assistant:

import anthropic
client = anthropic.Anthropic()
Step 1: Cache a system prompt with research guidelines
system_prompt = [
    {
        "type": "text",
        "text": "You are a research assistant. Always cite sources. Use structured JSON for final summaries.",
        "cache_control": {"type": "ephemeral"}
    }
]
Step 2: Define tools
tools = [
    {"type": "web_search", "name": "web_search"},
    {
        "name": "save_note",
        "description": "Save a research note to the database",
        "input_schema": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "content": {"type": "string"},
                "tags": {"type": "array", "items": {"type": "string"}}
            },
            "required": ["title", "content"]
        }
    }
]
Step 3: Send a request with thinking enabled
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={"type": "enabled", "budget_tokens": 2048},
    system=system_prompt,
    tools=tools,
    messages=[
        {"role": "user", "content": "Research the impact of AI on climate change mitigation. Provide a structured summary."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "research_summary",
            "schema": {
                "type": "object",
                "properties": {
                    "key_findings": {"type": "array", "items": {"type": "string"}},
                    "sources": {"type": "array", "items": {"type": "string"}},
                    "confidence_score": {"type": "number"}
                },
                "required": ["key_findings", "sources", "confidence_score"]
            }
        }
    }
)
print(response.content[0].text)

Key Takeaways

Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files/assets. Master each to build sophisticated applications.
Use Adaptive Thinking for complex tasks and Structured Outputs when you need machine-readable responses. Combine them for maximum reliability.
Leverage built-in tools (web search, code execution) and custom function calling to give Claude real-world agency. Use the Tool Runner SDK for complex orchestration.
Optimize cost and latency with prompt caching, context compaction, and batch processing (50% cost savings).
Feature availability varies by platform (Claude API, AWS, Bedrock, Vertex AI, Foundry). Always check the GA/Beta status before building for production.