GuideBeginnerBest Practices2026-05-22

Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management

Explore Claude's API surface including model capabilities, tools, context management, and file handling. Learn how to build powerful AI applications with practical code examples.

Quick Answer

This guide covers Claude's five API areas: model capabilities (thinking, citations), tools (web search, code execution), context management (prompt caching, compaction), and file handling. You'll learn how to use each with practical Python examples.

Claude APIToolsContext ManagementModel CapabilitiesBest Practices

Introduction

Claude's API is more than just a text-in, text-out interface. It's a comprehensive platform designed to give developers fine-grained control over how Claude reasons, interacts with external systems, and manages long-running conversations. Whether you're building a customer support bot, a code assistant, or an agent that browses the web, understanding the full API surface is essential.

This guide breaks down the five core areas of the Claude API: model capabilities, tools, tool infrastructure, context management, and files/assets. You'll learn what each area offers, when to use it, and see practical code examples you can adapt immediately.

1. Model Capabilities: Steering Claude's Reasoning

Model capabilities control how Claude processes input and what it outputs. These are the foundational building blocks for any application.

Extended Thinking and Adaptive Thinking

Claude can "think" before responding, which improves reasoning on complex tasks. With Extended Thinking, you set a fixed thinking budget. With Adaptive Thinking (recommended for Opus 4.7), Claude decides dynamically how much to think based on the task.

Python example: Adaptive thinking

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048,
        "effort": "high"  # Controls depth: low, medium, high
    },
    messages=[
        {"role": "user", "content": "Analyze the pros and cons of quantum computing for cryptography."}
    ]
)
print(response.content[0].text)

Citations

When Claude needs to reference source documents, use the Citations feature. Claude will return exact quotes and line numbers from your provided documents.

Python example: Citations

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    documents=[
        {
            "type": "text",
            "title": "Company Policy",
            "data": "All refunds must be requested within 30 days of purchase..."
        }
    ],
    messages=[
        {"role": "user", "content": "What is the refund policy?"}
    ]
)
Citations are included in response.content as citation objects
for block in response.content:
    if block.type == "citation":
        print(f"Source: {block.document_title}, Lines {block.start_line}-{block.end_line}")

Structured Outputs

For programmatic consumption, you can force Claude to output valid JSON conforming to a schema.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Extract the name, date, and amount from this invoice."}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "invoice",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "date": {"type": "string"},
                    "amount": {"type": "number"}
                },
                "required": ["name", "date", "amount"]
            }
        }
    }
)

2. Tools: Letting Claude Act in the World

Tools extend Claude's capabilities beyond text generation. Claude can call functions, browse the web, execute code, and more.

Built-in Tools

Claude offers several first-party tools:

Tool	Purpose
Web search tool	Fetch real-time information from the internet
Code execution tool	Run Python/JavaScript in a sandbox
Computer use tool	Control a virtual desktop (beta)
Memory tool	Persist information across conversations
Bash tool	Execute shell commands

Python example: Web search tool

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=[
        {
            "type": "web_search",
            "name": "web_search"
        }
    ],
    messages=[
        {"role": "user", "content": "What is the current population of Tokyo?"}
    ]
)
Claude will use the web_search tool if needed
print(response.content[0].text)

Custom Tools (Function Calling)

You can define your own tools using a JSON schema. Claude will request tool calls, and you execute them.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "What's the weather in London?"}
    ]
)
Check for tool use
for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}")
        print(f"Input: {block.input}")
        # Execute the tool and send result back

3. Tool Infrastructure: Orchestration at Scale

When you have many tools, you need a way to manage discovery, routing, and execution. Claude's tool infrastructure includes:

Tool Runner (SDK): Automates the tool-calling loop
Strict tool use: Forces Claude to use only specified tools
Parallel tool use: Execute multiple tools simultaneously
Fine-grained tool streaming: Stream tool calls and text separately

Tool Runner Example

from anthropic import Anthropic
from anthropic.types import ToolUseBlock
client = Anthropic()
Define tools
weather_tool = {
    "name": "get_weather",
    "description": "Get weather",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string"}
        },
        "required": ["location"]
    }
}
Tool Runner handles the loop automatically
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[weather_tool],
    tool_choice={"type": "auto"},
    messages=[{"role": "user", "content": "What's the weather in Paris and Tokyo?"}]
)
With parallel tool use, Claude may call both at once

4. Context Management: Keeping Conversations Efficient

Long conversations consume tokens and increase latency. Claude provides several features to manage context.

Context Windows

Claude supports up to 1 million tokens of context. That's enough to process entire codebases or hundreds of pages of documents.

Prompt Caching

Cache frequently used system prompts or document chunks to reduce latency and cost.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant with knowledge of our company policy.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "What is the vacation policy?"}
    ]
)

Context Compaction

When conversations grow too long, you can compact the context—summarizing earlier turns while preserving essential information.

Token Counting

Always check token usage before sending large payloads.

token_count = client.messages.count_tokens(
    model="claude-sonnet-4-20250514",
    messages=[
        {"role": "user", "content": "Hello, world!"}
    ]
)
print(f"Token count: {token_count}")

5. Files and Assets: Working with Documents

Claude can process various file types natively.

PDF Support

Claude extracts text and layout from PDFs, making them searchable and quotable.

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize this report."
                }
            ]
        }
    ]
)

Images and Vision

Claude can analyze images for tasks like OCR, object detection, and visual question answering.

Feature Availability by Platform

Not all features are available everywhere. Here's a quick reference:

Feature	Claude API	AWS Bedrock	Vertex AI
Extended Thinking	GA	GA	GA
Citations	GA	GA	Beta
Web Search	GA	GA	GA
Prompt Caching	GA	GA	GA
Batch Processing	GA	GA	GA

GA = Generally Available; Beta = Preview; ZDR = Zero Data Retention eligible

Best Practices

Start with model capabilities, then add tools as needed. Don't over-engineer.
Use adaptive thinking for Opus 4.7—it saves tokens on simple tasks and spends more on complex ones.
Cache your system prompts to reduce costs on repeated calls.
Use structured outputs when consuming responses programmatically—avoid parsing free text.
Monitor token usage with the count_tokens endpoint before sending large payloads.

Key Takeaways

Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files/assets.
Use adaptive thinking to let Claude decide how much to reason—saves tokens on simple tasks.
Tools extend Claude into the real world: web search, code execution, custom functions, and more.
Prompt caching and context compaction keep long-running conversations efficient and cost-effective.
Always check feature availability per platform—some features are still in beta on certain providers.