GuideBeginnerPricing2026-05-23

Mastering Claude’s API: A Complete Guide to Features, Tools, and Context Management

Explore Claude's API surface: model capabilities, tools, context management, and files. Learn how to steer reasoning, use citations, and optimize costs with practical code examples.

Quick Answer

This guide walks you through Claude’s five API areas: model capabilities (thinking, structured outputs), tools (web search, code execution), context management (prompt caching, compaction), and file handling. You’ll learn practical patterns for reasoning depth, cost savings, and scaling.

Claude APIExtended ThinkingPrompt CachingTool UseContext Windows

Introduction

Claude’s API is more than a simple text-in, text-out interface. It’s a modular platform designed to give you fine-grained control over reasoning, tool orchestration, context efficiency, and file handling. Whether you’re building a research assistant, a coding agent, or a customer support bot, understanding the five core areas of the API will help you ship faster and cheaper.

This guide covers:

Model capabilities – how to control reasoning depth and response format
Tools – letting Claude act on the web or in your environment
Tool infrastructure – discovery and orchestration at scale
Context management – keeping long sessions efficient
Files and assets – managing documents and data

We’ll include practical Python and TypeScript snippets so you can start using these features immediately.

1. Model Capabilities: Steering Claude’s Reasoning

Claude’s reasoning and output can be tuned with several parameters. The most impactful are extended thinking, adaptive thinking, and structured outputs.

Extended Thinking

Extended thinking lets Claude “think” before responding, producing better results on complex math, logic, and multi-step reasoning tasks. You control the thinking budget with the thinking parameter.

Python example:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    thinking={"type": "enabled", "budget_tokens": 2048},
    messages=[
        {"role": "user", "content": "Solve this step by step: 23 * 47 + 156 / 4"}
    ]
)
The thinking block is available in response.content
for block in response.content:
    if block.type == "thinking":
        print("Thinking:", block.thinking)
    elif block.type == "text":
        print("Answer:", block.text)

Adaptive Thinking (Recommended for Opus 4.7)

Adaptive thinking lets Claude decide when and how much to think, rather than you setting a fixed budget. Use the effort parameter to control depth.

TypeScript example:

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const response = await client.messages.create({
  model: 'claude-opus-4-20250514',
  max_tokens: 8192,
  thinking: { type: 'enabled', budget_tokens: 4096 },
  effort: 'high', // 'low' | 'medium' | 'high'
  messages: [{ role: 'user', content: 'Design a distributed cache system.' }]
});

Structured Outputs

You can force Claude to respond in a specific JSON schema using the structured_outputs parameter. This is ideal for extracting data or feeding downstream systems.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Extract the name, date, and amount from this invoice: ..."}],
    structured_outputs={
        "json_schema": {
            "name": "invoice",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "invoice_number": {"type": "string"},
                    "date": {"type": "string"},
                    "total_amount": {"type": "number"}
                },
                "required": ["invoice_number", "date", "total_amount"]
            }
        }
    }
)

2. Tools: Letting Claude Act

Claude can use tools to interact with external systems. The API supports several built-in tools and custom tool definitions.

Built-in Tools

Tool	Description
`web_search`	Search the web for current information
`code_execution`	Run Python or JavaScript code in a sandbox
`computer_use`	Control a virtual desktop (beta)
`bash`	Execute shell commands
`text_editor`	Read/write files
`memory`	Store and retrieve information across sessions

Custom Tools (Function Calling)

Define your own tools with a JSON schema. Claude will decide when to call them.

Python example – tool definition:

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
]
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

When Claude calls a tool, the response includes a tool_use content block. You execute the tool, then return the result.

3. Tool Infrastructure: Discovery and Orchestration

For complex agents, you need more than a single tool call. Claude’s tool infrastructure includes:

Tool Runner (SDK) – automatically executes tool calls and returns results
Strict tool use – forces Claude to use a specific tool
Parallel tool use – call multiple tools in one turn
Fine-grained tool streaming – stream tool calls as they happen
Programmatic tool calling – bypass Claude’s decision and call tools yourself

Parallel tool use example:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    tools=[weather_tool, stock_tool],
    parallel_tool_calls=True,
    messages=[{"role": "user", "content": "What's the weather in London and the current price of AAPL?"}]
)

4. Context Management: Efficiency at Scale

Long conversations consume tokens. Claude provides several mechanisms to manage context efficiently.

Prompt Caching

Cache repeated system prompts or large context blocks to reduce cost and latency. Cached content is billed at a fraction of the normal rate.

Python example:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Hello!"}]
)

Context Compaction

When a conversation grows too long, you can compact it – summarizing earlier turns while preserving key information. Use the compaction parameter or the dedicated compaction endpoint.

Context Windows

Claude supports up to 1 million tokens in a single context window (on supported models). This lets you process entire codebases, long books, or multi-hour transcripts.

5. Files and Assets

Claude can accept files as part of the message content. Supported formats include:

PDF – extract text and layout
Images – vision understanding (JPEG, PNG, GIF, WebP)
Text files – plain text, markdown, code

Upload a PDF for analysis:

import base64
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize this report."
                }
            ]
        }
    ]
)

6. Cost Optimization Tips

Batch processing: Use the Batch API for non-urgent workloads – it costs 50% less than standard API calls.
Prompt caching: Cache system prompts and large reference documents.
Adaptive thinking: Let Claude decide thinking depth – you pay only for what’s needed.
Streaming: Use streaming to reduce perceived latency and avoid paying for full responses you might cancel.

Key Takeaways

Claude’s API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files – each with specific optimization levers.
Extended thinking and adaptive thinking let you control reasoning depth; structured outputs enforce JSON schemas.
Built-in tools (web search, code execution, computer use) and custom function calling let Claude act autonomously.
Prompt caching and batch processing are the two biggest levers for reducing cost – use them early.
Context windows up to 1M tokens and context compaction make long-running agents feasible without losing history.