GuideBeginnerAgents2026-05-15

Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management

Explore the full Claude API surface: model capabilities, tools, context management, and file handling. Learn practical implementation with code examples and best practices.

Quick Answer

This guide walks you through the five core areas of the Claude API: model capabilities (thinking, structured outputs), tools (web fetch, code execution), context management (prompt caching, compaction), file handling (PDF, images), and batch processing. You'll learn how to combine these features for production-ready applications.

Claude APIExtended ThinkingTool UseContext ManagementPrompt Caching

Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management

Claude's API is more than just a text-in, text-out interface. It's a rich ecosystem of capabilities designed to handle complex reasoning, tool orchestration, long-running conversations, and multimodal inputs. Whether you're building a customer support agent, a code assistant, or a document analysis pipeline, understanding the full API surface is key to unlocking Claude's potential.

This guide covers the five core areas of the Claude API:

Model capabilities – reasoning depth, structured outputs, streaming
Tools – letting Claude act on the web or in your environment
Tool infrastructure – discovery and orchestration at scale
Context management – keeping long sessions efficient
Files and assets – managing documents and data

We'll also touch on availability (Beta vs. GA) and practical code examples so you can start building immediately.

---

1. Model Capabilities: Steering Claude's Output

Claude's model capabilities let you control how it reasons and what it returns. These are the building blocks for any application.

Extended Thinking & Adaptive Thinking

For complex tasks like math proofs, code generation, or multi-step reasoning, Claude can "think" before responding. The Extended Thinking feature allocates internal tokens for reasoning, improving accuracy on hard problems.

Adaptive Thinking (recommended for Opus 4.5+) lets Claude decide dynamically how much to think. Use the effort parameter to control depth:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048,
        "effort": "high"  # low, medium, high, or adaptive
    },
    messages=[{"role": "user", "content": "Prove that the square root of 2 is irrational."}]
)
print(response.content[0].thinking)  # The reasoning chain
print(response.content[1].text)      # The final answer

Tip: Use effort: "adaptive" for Opus 4.5+ to let Claude decide the thinking depth automatically. This saves tokens on simple queries and allocates more for complex ones.

Structured Outputs

Claude can return responses in a structured format (JSON, XML, or custom schemas). This is critical for programmatic consumption:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a data extraction assistant. Always respond in valid JSON.",
    messages=[{
        "role": "user",
        "content": "Extract the name, date, and total amount from this invoice: Invoice #1234, dated 2025-03-15, amount $2,450.00"
    }]
)

Streaming & Refusals

Streaming lets you receive tokens as they're generated, reducing perceived latency. Streaming refusals allow you to detect content policy violations mid-stream.

stream = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    stream=True,
    messages=[{"role": "user", "content": "Write a short poem about AI."}]
)
for event in stream:
    if event.type == "content_block_delta":
        print(event.delta.text, end="")
    elif event.type == "refusal":
        print(f"\n[Refusal detected]: {event.refusal.reason}")

Batch Processing

For high-volume, non-real-time tasks, the Batch API offers 50% cost savings. Send up to 10,000 queries per batch:

batch = client.batches.create(
    requests=[
        {
            "custom_id": "req-001",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Summarize: ..."}]
            }
        },
        # ... more requests
    ]
)
Poll for completion
result = client.batches.retrieve(batch.id)

Note: Batch processing is not ZDR eligible – data may be retained for processing. Use standard API for sensitive data.

---

2. Tools: Letting Claude Act in the World

Tools extend Claude's capabilities beyond text. Claude can call functions, fetch web pages, execute code, and even control a computer.

Web Search & Web Fetch

Claude can search the web or fetch specific URLs to ground responses in real-time information:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=[{
        "type": "web_search",
        "name": "web_search"
    }],
    messages=[{"role": "user", "content": "What's the latest news about Claude 4?"}]
)

Code Execution Tool

Claude can write and execute Python code in a sandboxed environment. Perfect for data analysis, calculations, or generating visualizations:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[{
        "type": "code_execution",
        "name": "execute_python"
    }],
    messages=[{"role": "user", "content": "Calculate the Fibonacci sequence up to 100 and plot it."}]
)

Computer Use (Beta)

For advanced automation, Claude can control a virtual desktop environment – clicking buttons, typing text, and navigating UIs. This is ideal for testing or legacy system integration.

Parallel Tool Use

Claude can call multiple tools simultaneously to speed up workflows:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[web_search_tool, code_execution_tool],
    parallel_tool_calls=True,
    messages=[{"role": "user", "content": "Find today's stock prices for AAPL and TSLA, then calculate their P/E ratios."}]
)

---

3. Tool Infrastructure: Discovery & Orchestration

When you have many tools, managing them becomes a challenge. Claude's tool infrastructure handles:

Tool search – automatically find the right tool for a task
Tool combinations – chain multiple tools together
Fine-grained tool streaming – stream results from each tool independently
Programmatic tool calling – call tools from your own code without Claude

Strict Tool Use

For safety-critical applications, enable strict tool use to prevent Claude from deviating from defined tools:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=[...],
    tool_choice={"type": "any"},  # Claude must use a tool
    strict=True,
    messages=[...]
)

---

4. Context Management: Keeping Sessions Efficient

Long conversations consume tokens. Claude provides several mechanisms to manage context windows efficiently.

Context Windows

Claude supports up to 1M tokens of context – enough to process entire codebases or lengthy documents. But bigger contexts cost more. Use context compaction to summarize older turns:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="Compress the conversation history into a concise summary, preserving key facts and decisions.",
    messages=[
        {"role": "user", "content": "Here is the full conversation log..."}
    ]
)

Prompt Caching

Prompt caching reduces latency and cost by reusing common prefixes (system prompts, few-shot examples) across multiple requests:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant specialized in Python programming.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Explain decorators."}]
)

Tip: Cache the system prompt and the first few user messages for maximum savings. Cache hits reduce latency by up to 80%.

Context Editing

For interactive applications, you can edit the context window – insert, delete, or replace messages without rebuilding the entire history.

---

5. Files and Assets: Working with Documents

Claude supports multiple input modalities:

PDF support – extract text, tables, and layout
Images – analyze diagrams, screenshots, or photos
Files API – upload and reference documents

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "base64",
                    "media_type": "application/pdf",
                    "data": pdf_data
                }
            },
            {
                "type": "text",
                "text": "Summarize the key findings from this report."
            }
        ]
    }]
)

Citations

Claude can cite exact sentences from source documents, making it ideal for legal, academic, or compliance use cases:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    citations=True,
    messages=[
        {"role": "user", "content": "What does the contract say about termination clauses?"}
    ]
)
for citation in response.citations:
    print(f"Source: {citation.document_title}, Page {citation.page_number}")
    print(f"Quote: {citation.quoted_text}")

---

Feature Availability at a Glance

Not all features are available everywhere. Here's a quick reference:

Feature	Claude API	AWS	Bedrock	Vertex AI
Extended Thinking	GA	GA	GA	GA
Batch Processing	GA	GA	GA	GA
Prompt Caching	GA	GA	GA	GA
Computer Use	Beta	Beta	Beta	Beta
Citations	GA	GA	GA	GA
Code Execution	GA	GA	GA	GA

Beta features may change significantly. Use GA features for production workloads.

---

Putting It All Together: A Practical Example

Let's build a research assistant that searches the web, reads a PDF, and generates a structured report:

import anthropic
import base64
client = anthropic.Anthropic()
Load PDF
with open("research_paper.pdf", "rb") as f:
    pdf_b64 = base64.b64encode(f.read()).decode()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system="You are a research assistant. Search the web for recent developments, then analyze the provided PDF.",
    tools=[
        {"type": "web_search", "name": "web_search"},
        {"type": "code_execution", "name": "execute_python"}
    ],
    citations=True,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_b64
                    }
                },
                {
                    "type": "text",
                    "text": "Search for the latest research on this topic, then summarize the PDF and the search results in a structured JSON report."
                }
            ]
        }
    ]
)
print(response.content[0].text)

---

Key Takeaways

Claude's API is organized into five pillars: model capabilities, tools, tool infrastructure, context management, and file handling. Master each to build sophisticated applications.
Use Extended Thinking for complex reasoning and Adaptive Thinking for Opus 4.5+ to save tokens on simple tasks.
Leverage tools like web search and code execution to give Claude real-world agency, but use strict tool mode for safety.
Prompt caching and context compaction are essential for cost-effective long-running sessions – cache system prompts and frequent prefixes.
Check feature availability before building – GA features are production-ready, while Beta features may change. Batch processing saves 50% but isn't ZDR eligible.