Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management
Explore the full Claude API surface: model capabilities, tools, context management, and files. Learn to build smarter AI applications with practical code examples.
This guide walks you through Claude's five core API areas—model capabilities, tools, context management, files, and tool infrastructure—with actionable code snippets and best practices for building production-ready AI applications.
Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management
Claude's API is more than just a text generation endpoint. It's a rich ecosystem designed to give you fine-grained control over how Claude reasons, interacts with external systems, and manages long-running conversations. Whether you're building a customer support bot, a code assistant, or a data analysis tool, understanding the full API surface is key to unlocking Claude's potential.
This guide covers the five core areas of the Claude API: model capabilities, tools, tool infrastructure, context management, and files/assets. You'll learn what each area offers, when to use it, and see practical code examples to get started.
Understanding the API Surface
Claude's API is organized into five logical areas. Each addresses a different aspect of building intelligent applications:
| Area | Purpose |
|---|---|
| Model Capabilities | Control how Claude reasons, formats responses, and handles input modalities. |
| Tools | Let Claude take actions on the web or in your environment (e.g., search, code execution). |
| Tool Infrastructure | Handle discovery, orchestration, and scaling of tools at an enterprise level. |
| Context Management | Keep long-running sessions efficient with prompt caching, compaction, and editing. |
| Files and Assets | Manage documents, images, and other data you provide to Claude. |
Model Capabilities: Steering Claude's Output
Model capabilities are the foundation. They let you control how Claude thinks and what it produces.
Extended Thinking and Adaptive Thinking
Claude can now reason step-by-step before responding. With Extended Thinking, you set a fixed thinking budget. With Adaptive Thinking (recommended for Opus 4.7), Claude dynamically decides how much to think based on the complexity of the task.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2048
},
messages=[
{"role": "user", "content": "Solve this math problem step by step: 23 * 47"}
]
)
print(response.content[0].text)
Structured Outputs
For production applications, you often need Claude to return data in a specific format. Use Structured Outputs to enforce JSON schemas.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Extract the date, amount, and vendor from this invoice."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "invoice_extraction",
"schema": {
"type": "object",
"properties": {
"date": {"type": "string"},
"amount": {"type": "number"},
"vendor": {"type": "string"}
},
"required": ["date", "amount", "vendor"]
}
}
}
)
print(response.content[0].text)
Batch Processing
When you have large volumes of requests, use Batch Processing to send them asynchronously. Batch API calls cost 50% less than standard API calls.
# Create a batch of messages
batch = client.batches.create(
requests=[
{
"custom_id": "req-001",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Summarize this document."}]
}
},
# Add more requests...
]
)
Check results later
results = client.batches.retrieve(batch.id)
Tools: Letting Claude Act in the World
Tools extend Claude's capabilities beyond text generation. Claude can call functions, search the web, execute code, and even control a computer.
How Tool Use Works
You define tools as JSON schemas. Claude decides when to call them based on the conversation.
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
]
)
Built-in Tools
Claude provides several server-side tools out of the box:
- Web Search Tool – Fetch real-time information from the web.
- Code Execution Tool – Run Python code in a sandboxed environment.
- Computer Use Tool – Let Claude interact with a virtual desktop.
- Memory Tool – Store and retrieve information across sessions.
- Text Editor Tool – Edit files programmatically.
Parallel Tool Use
Claude can call multiple tools simultaneously, speeding up complex workflows.
# Claude will call get_weather and get_time in parallel if appropriate
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[get_weather_tool, get_time_tool],
messages=[
{"role": "user", "content": "What's the weather and current time in London?"}
]
)
Context Management: Keeping Conversations Efficient
Long conversations can become expensive and slow. Claude's context management features help you stay efficient.
Prompt Caching
Cache frequently used context (like system prompts or large documents) to reduce costs and latency.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant.",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Hello!"}
]
)
Context Compaction and Editing
- Compaction – Summarize older parts of a conversation to fit within context windows.
- Editing – Remove or modify specific turns in the conversation history.
Token Counting
Always check token usage before sending a request to avoid hitting limits.
token_count = client.messages.count_tokens(
model="claude-sonnet-4-20250514",
messages=[
{"role": "user", "content": "Hello, how are you?"}
]
)
print(token_count.input_tokens) # e.g., 12
Files and Assets: Working with Documents and Images
Claude can process a variety of file types, including PDFs, images, and code files.
PDF Support
Upload PDFs directly and Claude will extract and understand their content.
import base64
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarize this report."
}
]
}
]
)
Images and Vision
Claude can analyze images for tasks like object detection, OCR, or visual question answering.
with open("diagram.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "What does this diagram show?"
}
]
}
]
)
Feature Availability and Lifecycle
Not all features are available on every platform. Claude categorizes features into:
- Beta – Preview features for testing. May change or be discontinued.
- Generally Available (GA) – Stable and recommended for production.
- Deprecated – Still functional but with a migration path.
- Retired – No longer available.
Putting It All Together: A Practical Workflow
Here's a realistic example combining multiple API features:
import anthropic
client = anthropic.Anthropic()
Step 1: Upload a PDF
with open("contract.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
Step 2: Use structured output + tools + caching
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system=[
{
"type": "text",
"text": "You are a legal document analyzer. Extract key clauses and check for risks.",
"cache_control": {"type": "ephemeral"}
}
],
tools=[
{
"name": "check_legal_compliance",
"description": "Check a clause against known regulations",
"input_schema": {
"type": "object",
"properties": {
"clause_text": {"type": "string"}
},
"required": ["clause_text"]
}
}
],
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Analyze this contract and extract all key clauses. Flag any risky ones."
}
]
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "contract_analysis",
"schema": {
"type": "object",
"properties": {
"clauses": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"risk_level": {"type": "string", "enum": ["low", "medium", "high"]},
"recommendation": {"type": "string"}
},
"required": ["title", "risk_level", "recommendation"]
}
}
},
"required": ["clauses"]
}
}
}
)
print(response.content[0].text)
Key Takeaways
- Claude's API is modular – Focus on model capabilities and tools first, then optimize with context management and file handling.
- Use structured outputs for production applications to get reliable, parseable responses.
- Batch processing cuts costs by 50% – Ideal for large-scale offline tasks.
- Prompt caching reduces latency and cost – Cache system prompts and large context blocks.
- Check feature availability per platform (Claude API, Bedrock, Vertex AI) before building, as not all features are GA everywhere.