GuideBeginnerPricing2026-05-22

Mastering Claude’s API: A Practical Guide to Features, Tools, and Context Management

Explore Claude's API surface: model capabilities, tools, context management, and files. Learn how to steer reasoning, use tools, and optimize costs with practical code examples.

Quick Answer

This guide walks you through Claude’s five API areas: model capabilities (thinking, citations), tools (web fetch, code execution), context management (prompt caching, compaction), and file handling. You’ll learn how to use each with code examples and best practices for production.

APItoolscontext managementbatch processingcitations

Introduction

Claude’s API is more than just a text-in, text-out interface. It’s a rich ecosystem of features designed to give you fine-grained control over how Claude reasons, what actions it can take, and how you manage long-running conversations. Whether you’re building a customer support bot, a code assistant, or a document analysis tool, understanding the full API surface will help you build faster, cheaper, and more reliably.

This guide covers the five main areas of the Claude API:

Model capabilities – controlling reasoning depth, response format, and input modalities
Tools – letting Claude interact with the web, files, and your environment
Tool infrastructure – discovery and orchestration at scale
Context management – keeping long sessions efficient
Files and assets – managing documents and data

We’ll focus on the features that are Generally Available (GA) and ready for production, with practical code examples in Python.

Model Capabilities

Model capabilities are the core levers you pull to shape Claude’s output. Here are the most impactful ones.

Extended Thinking & Adaptive Thinking

Claude can now decide when and how much to think before responding. This is especially useful for complex reasoning tasks like math, code generation, or multi-step planning.

Adaptive thinking (recommended for Opus 4.7) lets Claude dynamically allocate thinking time. You control the depth via the effort parameter.

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2000,
        "effort": "high"  # low, medium, high
    },
    messages=[
        {"role": "user", "content": "Solve this: A train leaves New York at 3 PM traveling 60 mph. Another train leaves Boston at 4 PM traveling 70 mph. The distance is 200 miles. When do they meet?"}
    ]
)
print(response.content[0].text)

Tip: Use effort: "low" for simple Q&A to save tokens, and effort: "high" for math, logic, or code generation.

Citations

Citations ground Claude’s responses in source documents. When you provide a document, Claude can return exact references to the relevant passages.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the refund policy?"}
    ],
    documents=[
        {
            "type": "text",
            "title": "Refund Policy",
            "content": "Refunds are available within 30 days of purchase..."
        }
    ],
    citations=True
)
print(response.content[0].citations)

Citations are GA on the Claude API and work with PDFs and text files.

Batch Processing

If you have thousands of requests (e.g., classifying support tickets, translating content), use Batch Processing to save 50% on API costs. Requests are processed asynchronously.

batch = client.batches.create(
    requests=[
        {
            "custom_id": "req-001",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 256,
                "messages": [{"role": "user", "content": "Classify this: 'My order is late'"}]
            }
        },
        {
            "custom_id": "req-002",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 256,
                "messages": [{"role": "user", "content": "Classify this: 'I love the new feature'"}]
            }
        }
    ]
)
Check results later
results = client.batches.retrieve(batch.id)

Note: Batch processing is not eligible for Zero Data Retention (ZDR). Use it for non-sensitive workloads.

Tools: Let Claude Take Action

Tools extend Claude’s capabilities beyond text generation. Claude can call functions, fetch web pages, execute code, and more.

Web Search Tool

Give Claude real-time web access. Perfect for research, news, or fact-checking.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "type": "web_search",
            "name": "web_search"
        }
    ],
    messages=[
        {"role": "user", "content": "What is the current population of Tokyo?"}
    ]
)
print(response.content[0].text)

Code Execution Tool

Let Claude write and run Python code in a sandboxed environment. Great for data analysis, calculations, or prototyping.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=[
        {
            "type": "code_execution",
            "name": "execute_python"
        }
    ],
    messages=[
        {"role": "user", "content": "Calculate the compound interest on $10,000 at 5% for 10 years."}
    ]
)

Parallel Tool Use

Claude can call multiple tools at once, reducing latency. For example, fetching weather data and calendar events simultaneously.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {"type": "web_search", "name": "web_search"},
        {"type": "function", "name": "get_calendar_events", "description": "Get today's events"}
    ],
    parallel_tool_calls=True,
    messages=[
        {"role": "user", "content": "What's the weather and do I have any meetings today?"}
    ]
)

Context Management

Long conversations consume tokens and increase latency. Claude provides several tools to keep sessions efficient.

Prompt Caching

Cache frequently used context (system prompts, few-shot examples, large documents) to reduce cost and latency. Cached content is reused across requests.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant for Acme Corp. Our products include...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "What is your return policy?"}
    ]
)

Context Compaction

When a conversation grows too long, use context compaction to summarize older messages while preserving key information.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Let's review our conversation so far and compact it."}
    ],
    compaction=True
)

Token Counting

Always check token usage before sending large payloads. Use the token counting endpoint to estimate costs.

count = client.messages.count_tokens(
    model="claude-sonnet-4-20250514",
    messages=[
        {"role": "user", "content": "Hello, can you help me with..."}
    ]
)
print(f"Input tokens: {count.input_tokens}")

Files and Assets

Claude can process PDFs, images, and text files directly.

PDF Support

Upload PDFs for analysis, summarization, or question-answering. Claude extracts text and layout.

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize this report."
                }
            ]
        }
    ]
)
print(response.content[0].text)

Images and Vision

Claude can analyze images for object detection, OCR, or visual reasoning.

with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "What does this chart show?"
                }
            ]
        }
    ]
)

Best Practices for Production

Start with model capabilities – master thinking, citations, and batch before adding tools.
Use prompt caching for system prompts and large context to reduce costs by up to 90%.
Monitor token usage with the counting endpoint to avoid surprises.
Enable parallel tool calls when Claude needs multiple pieces of information.
Use batch processing for non-urgent, high-volume tasks to save 50%.

Key Takeaways

Claude’s API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files.
Adaptive thinking lets Claude dynamically allocate reasoning depth – use effort to control it.
Batch processing cuts costs by 50% for asynchronous workloads.
Prompt caching and context compaction keep long sessions efficient and affordable.
Tools like web search and code execution let Claude take real-world actions, and parallel tool calls reduce latency.