Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management
Explore Claude's API surface: model capabilities, tools, context management, and file handling. Practical code examples and best practices for building with Claude.
This guide walks you through Claude's five API areas—model capabilities, tools, tool infrastructure, context management, and file handling—with practical code examples and best practices for building production-ready applications.
Introduction
Claude’s API is more than just a text-in, text-out interface. It’s a rich ecosystem designed to give you fine-grained control over how Claude reasons, interacts with external systems, and manages long-running conversations. Whether you’re building a simple chatbot, a complex agent, or an enterprise-grade application, understanding the five core areas of the API surface is essential.
This guide covers:
- Model capabilities – controlling reasoning depth and output format
- Tools – letting Claude act on the web or in your environment
- Tool infrastructure – discovery and orchestration at scale
- Context management – keeping long sessions efficient
- Files and assets – managing documents and data
1. Model Capabilities: Steering Claude’s Reasoning and Output
Claude offers several ways to control how it processes input and generates responses. The most important capabilities include:
Extended Thinking & Adaptive Thinking
Claude can “think” before responding, which improves reasoning on complex tasks. With Adaptive Thinking (recommended for Opus 4.7), Claude decides dynamically how much to think. You can also set a fixed effort parameter to control depth.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=1024,
thinking={"type": "enabled", "budget_tokens": 2048},
messages=[
{"role": "user", "content": "Solve this complex math problem step by step: integrate x^2 * sin(x) dx"}
]
)
print(response.content[0].text)
Structured Outputs
Use the stop_reason and stop_sequence parameters to control when Claude stops generating. For structured data, combine with tools or system prompts.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=500,
system="You are a data extraction assistant. Always output JSON.",
messages=[
{"role": "user", "content": "Extract the name, date, and amount from this invoice: ..."}
]
)
Streaming & Batch Processing
- Streaming: Receive tokens as they’re generated for real-time UX.
- Batch Processing: Send large volumes of requests asynchronously at 50% lower cost.
# Streaming example
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a short poem about AI."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
2. Tools: Letting Claude Take Action
Claude can use tools to interact with external systems. Tools are defined as JSON schemas and can be called in parallel or sequentially.
Defining a Tool
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)
Handling Tool Calls
When Claude decides to use a tool, the response contains a tool_use content block. You must execute the tool and return the result.
import json
if response.stop_reason == "tool_use":
for content in response.content:
if content.type == "tool_use":
tool_name = content.name
tool_input = content.input
# Execute your function
result = get_weather(tool_input["location"])
# Send result back
tool_result = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"},
{"role": "assistant", "content": response.content},
{"role": "user", "content": [{"type": "tool_result", "tool_use_id": content.id, "content": str(result)}]}
]
)
Built-in Tools
Claude provides several server-side tools you can enable:
- Web search tool – fetch live information
- Code execution tool – run Python/JavaScript in a sandbox
- Computer use tool – control a virtual desktop (beta)
- Memory tool – persist information across conversations
- Bash tool – execute shell commands
3. Tool Infrastructure: Discovery and Orchestration
For production systems with many tools, you need infrastructure to manage discovery, context, and scaling.
Tool Runner (SDK)
The Anthropic SDK includes a ToolRunner that handles tool call execution and result injection automatically.
from anthropic import Anthropic
from anthropic.tools import ToolRunner
client = Anthropic()
runner = ToolRunner(client, model="claude-sonnet-4-20250514", tools=tools)
response = runner.run("What's the weather in Tokyo?")
ToolRunner automatically executes tool calls and returns final response
Prompt Caching with Tools
Cache frequently used tool definitions to reduce latency and cost. Use the cache_control parameter on tool definitions.
tools = [
{
"name": "search_database",
"description": "Search internal database",
"input_schema": {...},
"cache_control": {"type": "ephemeral"}
}
]
Tool Combinations & Search
You can combine multiple tools in a single request. Claude will decide which tool to use based on the user’s intent. Use tool search to let Claude dynamically discover tools from a registry.
4. Context Management: Keeping Long Sessions Efficient
Claude supports up to 1M token context windows (on supported models). Managing this context efficiently is critical for cost and performance.
Context Windows
- Standard: 200K tokens
- Extended: Up to 1M tokens (available on select models)
Compaction
When a conversation grows too long, use context compaction to summarize or prune older messages while preserving key information.
# Manual compaction: send a summary
summary = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=500,
messages=[
{"role": "user", "content": "Summarize this conversation so far, keeping all important facts and decisions."}
]
)
Then start a new conversation with the summary as system prompt
Prompt Caching
Cache system prompts, tool definitions, and large context blocks to reduce latency and cost. Cache hits can reduce input token costs by up to 90%.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{"type": "text", "text": "You are a helpful assistant.", "cache_control": {"type": "ephemeral"}}
],
messages=[...]
)
Token Counting
Always count tokens before sending large requests to avoid hitting limits.
token_count = client.messages.count_tokens(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello, world!"}]
)
print(token_count.input_tokens) # e.g., 11
5. Files and Assets: Managing Documents and Data
Claude can process files directly via the Files API or by embedding content in messages.
PDF Support
Claude can read and reason over PDF documents. Upload the PDF as base64 or use the Files API.
import base64
with open("document.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{"type": "document", "source": {"type": "base64", "media_type": "application/pdf", "data": pdf_data}},
{"type": "text", "text": "Summarize this document."}
]
}
]
)
Images and Vision
Claude can analyze images. Pass them as base64-encoded data.
with open("chart.png", "rb") as f:
img_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": img_data}},
{"type": "text", "text": "What does this chart show?"}
]
}
]
)
Feature Availability & Lifecycle
Not all features are available on every platform. Claude’s API uses these classifications:
| Classification | Description |
|---|---|
| Beta | Preview features, may change or be discontinued. Not for production. |
| GA | Stable, fully supported, recommended for production. |
| Deprecated | Still functional but not recommended. Migration timeline provided. |
| Retired | No longer available. |
Best Practices Summary
- Start with model capabilities and tools – these are the building blocks.
- Use streaming for real-time UX, batch processing for cost savings.
- Cache prompts and tool definitions to reduce latency and cost.
- Monitor token usage with the Count Tokens endpoint.
- Handle tool calls explicitly – always check
stop_reasonand return results. - Use context compaction for long-running conversations.
- Check feature availability on your target platform before building.
Key Takeaways
- Claude’s API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files/assets.
- Use Adaptive Thinking and Structured Outputs to control reasoning depth and response format.
- Tools let Claude interact with external systems; use the ToolRunner SDK for automatic orchestration.
- Prompt caching and context compaction are essential for cost-effective long sessions.
- Always check feature availability (Beta vs. GA) and platform support before building production applications.