BeClaude
GuideBeginnerBest Practices2026-05-14

Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management

Explore Claude's API surface—model capabilities, tools, context management, and more. Learn how to build powerful AI applications with practical examples.

Quick Answer

This guide walks you through Claude's five core API areas: model capabilities, tools, tool infrastructure, context management, and file handling. You'll learn how to use each area with code examples and best practices for production.

Claude APIToolsContext ManagementModel CapabilitiesBest Practices

Introduction

Claude's API is designed to give developers fine-grained control over how the model reasons, interacts with external systems, and manages long-running conversations. Whether you're building a simple chatbot or a complex agent that browses the web and executes code, understanding the API's structure is essential.

This guide covers the five main areas of Claude's API surface:

  • Model capabilities – How Claude reasons and formats responses
  • Tools – Letting Claude take actions on the web or in your environment
  • Tool infrastructure – Discovery and orchestration at scale
  • Context management – Keeping long-running sessions efficient
  • Files and assets – Managing documents and data you provide to Claude
By the end, you'll have a clear mental model of the API and practical code snippets to get started.

Model Capabilities: Steering Claude's Output

Model capabilities are the core ways you control Claude's reasoning depth, response format, and input modalities. Here are the key features:

Context Windows

Claude supports context windows up to 1 million tokens, allowing you to process entire codebases, lengthy documents, or long conversations in a single request. This is available on all major platforms (Claude API, AWS Bedrock, Vertex AI) and is generally available (GA).

import anthropic

client = anthropic.Anthropic()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, messages=[ {"role": "user", "content": "Summarize this 500-page document..."} ] )

Adaptive Thinking

With Adaptive Thinking, Claude dynamically decides when and how much to "think" before responding. This is the recommended mode for Opus 4.7. You control thinking depth using the effort parameter.

response = client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=8192,
    thinking={
        "type": "enabled",
        "budget_tokens": 4096,
        "effort": "high"  # Options: low, medium, high
    },
    messages=[
        {"role": "user", "content": "Solve this complex math problem step by step..."}
    ]
)

Structured Outputs

For applications that need consistent, parseable responses, use Structured Outputs. Define a JSON schema and Claude will adhere to it.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[{
        "name": "generate_report",
        "input_schema": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "summary": {"type": "string"},
                "key_points": {
                    "type": "array",
                    "items": {"type": "string"}
                }
            },
            "required": ["title", "summary", "key_points"]
        }
    }],
    tool_choice={"type": "tool", "name": "generate_report"},
    messages=[
        {"role": "user", "content": "Create a report on Q3 earnings."}
    ]
)

Streaming

Stream responses token by token for a better user experience:

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a short poem."}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Tools: Letting Claude Take Action

Tools extend Claude's capabilities beyond text generation. They allow Claude to interact with external systems—search the web, execute code, use a calculator, or even control a computer.

Defining a Tool

Tools are defined using a JSON schema. Here's an example of a simple weather lookup tool:

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name, e.g., 'San Francisco'"
                }
            },
            "required": ["location"]
        }
    }
]

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools, messages=[ {"role": "user", "content": "What's the weather in Tokyo?"} ] )

Handling Tool Calls

When Claude decides to use a tool, the response includes a tool_use content block. You must execute the tool and return the result:

import json

def handle_tool_call(tool_name, tool_input): if tool_name == "get_weather": location = tool_input["location"] # Call your weather API here return {"temperature": 22, "condition": "sunny"} return {"error": "Unknown tool"}

After receiving response with tool_use

for content in response.content: if content.type == "tool_use": result = handle_tool_call(content.name, content.input) # Send result back to Claude follow_up = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools, messages=[ {"role": "user", "content": "What's the weather in Tokyo?"}, {"role": "assistant", "content": response.content}, { "role": "user", "content": [ { "type": "tool_result", "tool_use_id": content.id, "content": json.dumps(result) } ] } ] )

Built-in Tools

Claude provides several pre-built tools you can enable:

  • Web search tool – Let Claude search the internet
  • Code execution tool – Run Python code in a sandbox
  • Computer use tool – Control a virtual desktop
  • Memory tool – Store and retrieve information across sessions
  • Bash tool – Execute shell commands
Enable them in the API:
tools = [
    {
        "type": "computer_20250124",
        "name": "computer",
        "display_width_px": 1024,
        "display_height_px": 768
    },
    {
        "type": "text_editor_20250124",
        "name": "str_replace_editor"
    }
]

Tool Infrastructure: Orchestration at Scale

When you have many tools, you need infrastructure to manage discovery, routing, and execution. Key components:

  • Tool Runner (SDK) – Automates tool execution and result handling
  • Strict tool use – Forces Claude to use only specified tools
  • Parallel tool use – Execute multiple tools simultaneously
  • Fine-grained tool streaming – Stream tool calls and results
  • Tool search – Dynamically discover tools based on context

Parallel Tool Use Example

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[weather_tool, stock_tool, news_tool],
    parallel_tool_calls=True,  # Enable parallel execution
    messages=[
        {"role": "user", "content": "What's the weather in London and the current stock price of Apple?"}
    ]
)

Context Management: Keeping Conversations Efficient

Long-running sessions require careful context management to stay within token limits and control costs.

Context Windows

Claude supports up to 1M tokens. Use the max_tokens parameter to control response length.

Compaction

When a conversation grows too long, you can compact it—summarize earlier turns while preserving key information:

# Pseudocode for compaction
compacted_history = compact_conversation(conversation_history)

Send compacted version in next request

Prompt Caching

Cache frequently used system prompts or context to reduce latency and cost:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

Token Counting

Estimate token usage before sending a request:

# Using the token counting endpoint
response = client.messages.count_tokens(
    model="claude-sonnet-4-20250514",
    messages=[
        {"role": "user", "content": "Hello, world!"}
    ]
)
print(f"Input tokens: {response.input_tokens}")

Files and Assets: Managing Documents and Data

Claude can process various file types, including PDFs, images, and code files.

PDF Support

import base64

with open("document.pdf", "rb") as f: pdf_data = base64.b64encode(f.read()).decode("utf-8")

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, messages=[ { "role": "user", "content": [ { "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": pdf_data } }, { "type": "text", "text": "Summarize this document." } ] } ] )

Image and Vision

Claude can analyze images:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "What's in this image?"
                }
            ]
        }
    ]
)

Feature Availability by Platform

Not all features are available everywhere. Here's a quick reference:

FeatureClaude APIAWS BedrockVertex AI
Context Windows (1M tokens)GAGAGA
Adaptive ThinkingGAGAGA
Batch ProcessingGAGAGA
CitationsGAGAGA
Prompt CachingGAGABeta
Computer Use ToolBetaBetaN/A
Check the official docs for the latest availability.

Key Takeaways

  • Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files/assets. Start with model capabilities and tools, then explore the others as you scale.
  • Use Adaptive Thinking for complex reasoning – Let Claude decide when to think deeply, controlled by the effort parameter. This is especially powerful with Opus 4.7.
  • Tools extend Claude beyond text – Define custom tools with JSON schemas, or use built-in tools for web search, code execution, and computer control. Enable parallel tool calls for efficiency.
  • Manage context proactively – Use prompt caching, compaction, and token counting to keep long-running sessions efficient and cost-effective. Batch processing can save 50% on API costs.
  • Feature availability varies by platform – Always check the documentation for GA vs. Beta status, especially when building for production. Use beta headers for features in preview.