Guide2026-05-01

Mastering the Claude API: A Complete Guide to Features, Tools, and Best Practices

Explore the full Claude API surface—model capabilities, tools, context management, and files. Learn practical implementation with code examples and expert tips for production use.

Quick Answer

This guide walks you through the five pillars of the Claude API: model capabilities, tools, tool infrastructure, context management, and files. You'll learn how to control reasoning, use tools, manage long sessions, and handle documents—with code examples and best practices for each area.

Claude APItool usecontext managementstructured outputsbatch processing

Introduction

Building with Claude means tapping into a rich API surface designed for everything from simple chat to complex, multi-step agentic workflows. The Claude API is organized into five core areas: model capabilities, tools, tool infrastructure, context management, and files/assets. Each area solves a specific problem—whether you need to control reasoning depth, let Claude browse the web, manage long-running conversations, or process PDFs.

This guide covers every area in detail, with practical code examples and best practices. By the end, you'll know exactly which features to use for your use case and how to implement them.

The Five Pillars of the Claude API

1. Model Capabilities

Model capabilities are the direct ways you steer Claude's reasoning and output. They include:

Extended thinking – Claude can "think" before responding, improving accuracy on complex tasks.
Adaptive thinking – Claude dynamically decides when and how much to think. Use the effort parameter to control depth.
Structured outputs – Force Claude to output valid JSON or follow a strict schema.
Streaming – Receive tokens as they're generated for real-time UX.
Batch processing – Send large volumes of requests asynchronously at 50% lower cost.
Multilingual support – Claude works in dozens of languages natively.

#### Example: Using Extended Thinking with Effort

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=1024,
    thinking={"type": "enabled", "budget_tokens": 2048, "effort": "high"},
    messages=[
        {"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) from 0 to pi"}
    ]
)
print(response.content[0].text)

Best practice: Use effort to balance speed vs. depth. "low" for quick tasks, "high" for complex reasoning.

2. Tools

Tools let Claude take actions in the real world—browse the web, execute code, run shell commands, or interact with your application. The tool system includes:

Web search tool – Claude can search the internet for up-to-date information.
Web fetch tool – Retrieve content from a specific URL.
Code execution tool – Run Python or JavaScript in a sandboxed environment.
Computer use tool – Claude can control a virtual desktop (beta).
Bash tool – Execute shell commands.
Text editor tool – Edit files programmatically.
Memory tool – Store and retrieve information across sessions.
Custom tools – Define your own functions via the tools parameter.

#### Example: Using a Custom Tool to Get Weather

import anthropic
client = anthropic.Anthropic()
def get_weather(location: str) -> str:
    # In production, call a real weather API
    return f"The weather in {location} is sunny, 72°F."
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state, e.g., San Francisco, CA"
                    }
                },
                "required": ["location"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "What's the weather in Austin, TX?"}
    ]
)
Handle tool call
if response.stop_reason == "tool_use":
    tool_use = response.content[-1]
    if tool_use.name == "get_weather":
        result = get_weather(tool_use.input["location"])
        # Send result back to Claude
        final_response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[
                {"role": "user", "content": "What's the weather in Austin, TX?"},
                {"role": "assistant", "content": response.content},
                {"role": "user", "content": [{"type": "tool_result", "tool_use_id": tool_use.id, "content": result}]}
            ]
        )
        print(final_response.content[0].text)

Best practice: Use parallel tool calls to let Claude invoke multiple tools simultaneously for efficiency.

3. Tool Infrastructure

When you have many tools, you need discovery and orchestration. The Claude API provides:

Tool Runner (SDK) – Automates the tool-calling loop.
Strict tool use – Force Claude to always use a specific tool.
Tool search – Dynamically find the right tool for a user request.
Fine-grained tool streaming – Stream tool calls and results incrementally.
Tool combinations – Group tools into logical sets.

#### Example: Using Tool Runner (SDK)

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.tools.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  tools: [weatherTool, calculatorTool],
  tool_choice: { type: 'auto' },
  messages: [
    { role: 'user', content: 'What is 2+2 and the weather in Paris?' }
  ]
});
// Tool Runner handles the loop automatically
console.log(response.content);

Best practice: Use Tool Runner for complex agents with multiple tool calls. It handles retries and error recovery.

4. Context Management

Long conversations consume tokens and can degrade performance. Claude offers:

Context windows – Up to 1M tokens for processing large documents.
Compaction – Summarize or compress old messages to save tokens.
Context editing – Remove or modify parts of the conversation history.
Prompt caching – Cache system prompts or large context blocks to reduce latency and cost.
Token counting – Estimate token usage before sending a request.

#### Example: Using Prompt Caching

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant with deep knowledge of Python programming.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Explain Python decorators with examples."}
    ]
)
print(response.content[0].text)

Best practice: Cache system prompts and large reference documents. Cache hits reduce latency by up to 80%.

5. Files and Assets

Claude can process documents, images, and other files directly:

PDF support – Extract text and layout from PDFs.
Images and vision – Claude can analyze images (photos, diagrams, screenshots).
Files API – Upload and reference files in conversations.

#### Example: Processing a PDF

import anthropic
client = anthropic.Anthropic()
with open("report.pdf", "rb") as f:
    pdf_data = f.read()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": base64.b64encode(pdf_data).decode()
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize the key findings in this PDF."
                }
            ]
        }
    ]
)
print(response.content[0].text)

Best practice: For large PDFs, use the Citations feature to get grounded, verifiable responses with exact sentence references.

Feature Availability and Maturity

Not all features are generally available (GA). The API uses these classifications:

Classification	Description
Beta	Preview features for feedback. May change or be discontinued. Not for production.
Generally Available (GA)	Stable, fully supported, recommended for production.
Deprecated	Still functional but not recommended. Migration path provided.
Retired	No longer available.

Always check the Claude Platform Console for the latest status of beta features.

Putting It All Together: A Multi-Step Agent

Here's a practical example combining tools, context management, and files:

import anthropic
client = anthropic.Anthropic()
Step 1: Upload a PDF and ask Claude to analyze it
with open("financial_report.pdf", "rb") as f:
    pdf_data = f.read()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    system=[
        {
            "type": "text",
            "text": "You are a financial analyst. Use the web search tool to verify any data points.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    tools=[
        {
            "name": "web_search",
            "description": "Search the web for current information",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            }
        }
    ],
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": base64.b64encode(pdf_data).decode()
                    }
                },
                {
                    "type": "text",
                    "text": "Analyze this financial report and verify the revenue figures using web search."
                }
            ]
        }
    ]
)
print(response.content)

Key Takeaways

Start with model capabilities and tools – They cover 80% of use cases. Add context management and files as your application grows.
Use adaptive thinking for complex tasks – The effort parameter lets you balance speed and depth without manual tuning.
Leverage prompt caching for production – Cache system prompts and large reference documents to reduce latency and cost significantly.
Batch process for cost savings – Use the Batch API for non-real-time workloads and save 50% on API costs.
Check feature maturity – Always verify whether a feature is GA or Beta before building production systems. Beta features can change without notice.