GuideBeginnerAgents2026-05-19

Mastering Claude’s API: A Practical Guide to Features, Tools, and Context Management

Learn how to build with Claude's API using model capabilities, tools, context management, and files. Includes code examples, feature availability, and best practices for production.

Quick Answer

This guide walks you through Claude’s API surface—model capabilities, tools, context management, and file handling—with practical code examples and feature availability details to help you build smarter, faster applications.

Claude APIExtended ThinkingTool UseContext WindowsPrompt Caching

Introduction

Claude’s API is more than just a text generation endpoint. It’s a rich platform designed to give you fine-grained control over how Claude reasons, what actions it can take, and how you manage long-running conversations. Whether you’re building a customer support bot, a code assistant, or a research tool, understanding the five core areas of the API surface will help you get the most out of Claude.

This guide covers:

Model capabilities – reasoning depth, response format, and input modalities
Tools – letting Claude act on the web or in your environment
Tool infrastructure – discovery and orchestration at scale
Context management – keeping long sessions efficient
Files and assets – managing documents and data

By the end, you’ll know which features to use for your use case and how to combine them effectively.

1. Model Capabilities: Steering Claude’s Output

Claude’s model capabilities let you control how it reasons and formats responses. The key features include:

Extended Thinking & Adaptive Thinking

Claude can “think” before responding, which improves reasoning on complex tasks. With Adaptive Thinking (GA on Claude API, AWS, Bedrock, and Vertex AI), Claude dynamically decides when and how much to think. You can also set a fixed thinking budget using the effort parameter.

Example: Using the effort parameter in Python

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Solve this step by step: 23 * 47"}],
    thinking={"type": "enabled", "budget_tokens": 2000, "effort": "high"}
)
print(response.content[0].text)

Structured Outputs

Claude can return structured data (JSON) directly, making it easy to integrate with your application logic.

Example: Requesting JSON output

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "List three fruits in JSON format with name and color."}],
    response_format={"type": "json_object"}
)
print(response.content[0].text)
Output: {"fruits": [{"name": "Apple", "color": "Red"}, ...]}

Streaming & Batch Processing

Streaming – Get tokens as they’re generated for real-time UX.
Batch Processing – Send large volumes of requests asynchronously at 50% lower cost (not ZDR eligible).

Streaming example:

stream = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a short story."}],
    stream=True
)
for event in stream:
    if event.type == "content_block_delta":
        print(event.delta.text, end="")

2. Tools: Letting Claude Take Action

Tools extend Claude’s capabilities beyond text. You can define custom tools (functions) or use built-in tools like web search, code execution, and computer use.

How Tool Use Works

You define a tool with a name, description, and input schema.
Claude decides whether to call the tool based on the conversation.
You execute the tool and return the result to Claude.

Example: Defining a weather tool

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
]
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)
Check if Claude wants to call a tool
if response.stop_reason == "tool_use":
    tool_call = response.content[-1]
    print(f"Tool called: {tool_call.name}")
    print(f"Arguments: {tool_call.input}")

Built-in Tools (Beta)

Web Search Tool – Claude can search the web for up-to-date information.
Code Execution Tool – Run Python code in a sandboxed environment.
Computer Use Tool – Claude can interact with a virtual desktop (beta: research preview).

3. Tool Infrastructure: Discovery & Orchestration

When you have many tools, you need a way to manage them. Claude’s tool infrastructure includes:

Tool Runner (SDK) – Automates tool execution and result injection.
Strict Tool Use – Forces Claude to use a specific tool.
Parallel Tool Use – Claude can call multiple tools at once.
Tool Search – Dynamically find the right tool for a task.
Fine-grained Tool Streaming – Stream tool calls and results separately.

Example: Parallel tool use

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Get the weather in Paris and London."}],
    tools=[weather_tool],
    parallel_tool_calls=True
)

4. Context Management: Keeping Sessions Efficient

Long conversations can consume many tokens. Claude provides several features to manage context:

Context Windows

Claude supports up to 1 million tokens of context (GA on most platforms). This allows processing entire books, large codebases, or long chat histories.

Prompt Caching

Cache repeated system prompts or large context blocks to reduce latency and cost. Cached prompts are served faster and at a lower token cost.

Example: Using prompt caching

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a legal assistant...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Summarize this contract."}]
)

Context Editing & Compaction

Context Editing – Manually insert or remove messages from the conversation history.
Compaction – Automatically summarize older parts of the conversation to save tokens.

5. Files and Assets: Working with Documents

Claude can process files directly, including PDFs, images, and text documents.

PDF Support

You can send PDFs to Claude for analysis. Claude will extract text and layout information.

Example: Sending a PDF

import base64
with open("document.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {"type": "text", "text": "Summarize this document."}
            ]
        }
    ]
)

Images & Vision

Claude can analyze images (photos, diagrams, screenshots) and answer questions about them.

Feature Availability Quick Reference

Not all features are available on every platform. Here’s a summary:

Feature	Claude API	AWS	Bedrock	Vertex AI
Extended Thinking	GA	GA	GA	GA
Batch Processing	GA	GA	GA	GA
Prompt Caching	GA	GA	GA	GA
Web Search Tool	Beta	Beta	Beta	Beta
Computer Use	Beta	Beta	Beta	Beta
Structured Outputs	GA	GA	GA	GA

GA = Generally Available, Beta = Preview (may change)

Best Practices for Production

Start with model capabilities – Get your core logic working before adding tools.
Use prompt caching for system prompts and large context blocks to reduce costs.
Monitor token usage with the token counting API to avoid surprises.
Handle stop reasons – Check stop_reason in responses to detect tool calls, max tokens, or end of turn.
Test with streaming for better user experience, but fall back to non-streaming for reliability.

Key Takeaways

Claude’s API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files.
Use Extended Thinking for complex reasoning tasks and Structured Outputs for JSON integration.
Tools let Claude interact with external systems; use Parallel Tool Use for efficiency.
Prompt Caching and Context Windows help manage long sessions cost-effectively.
Check feature availability per platform—some features are still in beta on certain providers.