Guide2026-05-05

Mastering the Claude API: A Complete Guide to Features, Tools, and Workflows

Explore Claude's API surface—model capabilities, tools, context management, and files. Learn practical patterns with code examples to build production-ready AI applications.

Quick Answer

This guide walks you through Claude's five API areas: model capabilities, tools, tool infrastructure, context management, and files. You'll learn how to steer reasoning, use tools, manage long sessions, and process documents—with code examples for each.

Claude APItool usecontext managementmodel capabilitiesbatch processing

Mastering the Claude API: A Complete Guide to Features, Tools, and Workflows

Claude's API is more than a simple text-in, text-out interface. It's a rich platform organized into five core areas that let you control reasoning, take actions, manage context, and handle files. Whether you're building a chatbot, an agent, or a document analysis pipeline, understanding these areas is key to getting the most out of Claude.

This guide covers each area with practical examples and best practices. By the end, you'll know how to choose the right features for your use case and how to combine them for production-ready applications.

The Five Pillars of the Claude API

Claude's API surface is organized into:

Model capabilities – Control how Claude reasons and formats responses.
Tools – Let Claude take actions on the web or in your environment.
Tool infrastructure – Handle discovery and orchestration at scale.
Context management – Keep long-running sessions efficient.
Files and assets – Manage documents and data you provide to Claude.

If you're new, start with model capabilities and tools. Return to the other sections when you're ready to optimize cost, latency, or scale.

Model Capabilities: Steering Claude's Output

Model capabilities are the direct levers you pull to shape Claude's responses. They include reasoning depth, response format, and input modalities.

Extended Thinking and Adaptive Thinking

Claude can "think" before it responds. With extended thinking, you set a fixed budget for reasoning tokens. With adaptive thinking (recommended for Opus 4.7), Claude decides dynamically how much to think based on the task.

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2048  # Fixed budget for extended thinking
    },
    messages=[
        {"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) dx"}
    ]
)
The response includes both thinking and final answer
print(response.content[0].thinking)
print(response.content[1].text)

For adaptive thinking, use the effort parameter instead:

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "effort": "high"  # Options: low, medium, high
    },
    messages=[
        {"role": "user", "content": "Analyze the ethical implications of autonomous vehicles."}
    ]
)

Structured Outputs

You can force Claude to output valid JSON or follow a specific schema using the structured_outputs parameter:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Extract the name, date, and amount from this invoice: ..."}
    ],
    structured_outputs={
        "json_schema": {
            "name": "invoice",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "date": {"type": "string"},
                    "amount": {"type": "number"}
                },
                "required": ["name", "date", "amount"]
            }
        }
    }
)

Citations for Grounded Responses

When you need Claude to reference specific passages in source documents, use the Citations feature. It returns exact sentence-level references:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Summarize the key findings from the attached research paper."}
    ],
    documents=[
        {
            "type": "document",
            "source": {
                "type": "text",
                "media_type": "text/plain",
                "data": "...full research paper text..."
            },
            "citations": {"enabled": True}
        }
    ]
)
Each citation includes the exact sentence and passage used
for citation in response.citations:
    print(f"Cited: {citation.quote}")
    print(f"From: {citation.document_index}")

Tools: Letting Claude Take Action

Tools are how Claude interacts with the outside world—performing web searches, running code, or calling your own APIs.

Built-in Tools

Claude comes with several server-side tools you can enable with a single flag:

Web search tool – Search the internet for up-to-date information.
Code execution tool – Run Python code in a sandboxed environment.
Computer use tool – Control a virtual desktop (beta).
Memory tool – Store and retrieve information across conversations.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[
        {"type": "web_search"},
        {"type": "code_execution"}
    ],
    messages=[
        {"role": "user", "content": "What's the current population of Tokyo? Also, calculate the area of a circle with radius 5."}
    ]
)

Custom Tools (Function Calling)

You can define your own tools using a JSON schema. Claude will decide when to call them:

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
]
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ]
)
Handle the tool call
if response.stop_reason == "tool_use":
    for block in response.content:
        if block.type == "tool_use":
            tool_name = block.name
            tool_input = block.input
            print(f"Claude wants to call {tool_name} with {tool_input}")
            # Execute your function and return the result

Parallel Tool Use

Claude can call multiple tools in a single turn, which is great for independent operations:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[weather_tool, news_tool, stock_tool],
    parallel_tool_calls=True,  # Enable parallel calls
    messages=[
        {"role": "user", "content": "Get the weather in London, the latest tech news, and Apple's stock price."}
    ]
)

Context Management: Keeping Sessions Efficient

Long conversations consume tokens. Claude provides several mechanisms to manage context windows efficiently.

Context Windows

Claude supports up to 1 million tokens of context—enough to process entire codebases or lengthy documents. Use the Models API to check a model's limits:

models = client.models.list()
for model in models:
    print(f"{model.id}: max_input={model.max_input_tokens}, max_output={model.max_tokens}")

Prompt Caching

Reduce latency and cost by caching frequently used context (like system prompts or document chunks):

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant specialized in Python programming.",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Explain decorators in Python."}
    ]
)
Check if cache was used
print(f"Cache created: {response.cache_creation_input_tokens}")
print(f"Cache read: {response.cache_read_input_tokens}")

Batch Processing for Cost Savings

For large volumes of requests, use batch processing. Batch API calls cost 50% less than standard calls:

# Submit a batch of requests
batch = client.batches.create(
    requests=[
        {
            "custom_id": "req-001",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Summarize this document..."}]
            }
        },
        {
            "custom_id": "req-002",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Translate this to French..."}]
            }
        }
    ]
)
Retrieve results later
results = client.batches.retrieve(batch.id)

Files and Assets: Working with Documents

Claude can process PDFs, images, and other file types directly.

PDF Support

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize the key findings from this report."
                }
            ]
        }
    ]
)

Images and Vision

Claude can analyze images for tasks like OCR, object detection, and visual reasoning:

with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe what this chart shows."
                }
            ]
        }
    ]
)

Putting It All Together: A Practical Workflow

Here's a real-world example that combines multiple features: a research assistant that searches the web, reads PDFs, and provides cited answers.

import anthropic
import base64
client = anthropic.Anthropic()
Step 1: Load a PDF document
with open("research_paper.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
Step 2: Ask Claude to search for related information and analyze the PDF
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[
        {"type": "web_search"},
        {
            "name": "get_paper_metadata",
            "description": "Get metadata for a research paper",
            "input_schema": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "authors": {"type": "array", "items": {"type": "string"}}
                },
                "required": ["title"]
            }
        }
    ],
    documents=[
        {
            "type": "document",
            "source": {
                "type": "base64",
                "media_type": "application/pdf",
                "data": pdf_data
            },
            "citations": {"enabled": True}
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "Summarize this paper and find recent news about the same topic."
        }
    ]
)
Step 3: Process the response
for block in response.content:
    if block.type == "text":
        print(block.text)
    elif block.type == "tool_use":
        print(f"Tool called: {block.name}")
        print(f"Input: {block.input}")

Feature Availability and Lifecycle

Features on the Claude Platform go through stages:

Classification	Description
Beta	Preview features for feedback. May change or be discontinued. Not for production.
Generally Available (GA)	Stable, fully supported, recommended for production.
Deprecated	Still functional but not recommended. Migration path provided.
Retired	No longer available.

Check each feature's documentation for its current status and any platform-specific limitations (e.g., available on Claude API but not yet on Amazon Bedrock).

Key Takeaways

Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files. Start with the first two.
Use extended thinking or adaptive thinking for complex reasoning tasks. Adaptive thinking with the effort parameter is recommended for Opus 4.7.
Leverage built-in tools (web search, code execution) and custom tools for agentic workflows. Enable parallel tool calls for independent operations.
Optimize cost and latency with prompt caching, batch processing (50% cost reduction), and context compaction for long sessions.
Process files natively – Claude supports PDFs, images, and other formats. Use Citations for verifiable, grounded responses with exact source references.