Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management
Explore the full Claude API surface: model capabilities, tools, context management, and file handling. Learn practical implementation with code examples and best practices.
This guide walks you through the five core areas of the Claude API: model capabilities (thinking, structured outputs), tools (web fetch, code execution), context management (prompt caching, compaction), file handling (PDF, images), and batch processing. You'll learn how to combine these features for production-ready applications.
Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management
Claude's API surface is more than just a chat endpoint. It's a comprehensive platform designed to give you fine-grained control over how Claude reasons, interacts with external systems, manages long conversations, and handles complex data. Whether you're building a simple Q&A bot or a sophisticated agent that browses the web and executes code, understanding these five core areas is essential.
This guide breaks down each area with practical code examples, availability notes, and best practices so you can start building with confidence.
The Five Pillars of the Claude API
Claude's API is organized into five interconnected areas:
- Model Capabilities – Control how Claude reasons and formats responses.
- Tools – Let Claude take actions on the web or in your environment.
- Tool Infrastructure – Handle discovery and orchestration at scale.
- Context Management – Keep long-running sessions efficient.
- Files and Assets – Manage documents and data you provide to Claude.
1. Model Capabilities: Steering Claude's Output
Model capabilities are the direct levers you pull to shape Claude's responses. They include reasoning depth, response format, and input modalities.
Extended Thinking and Adaptive Thinking
Claude can "think" before responding, which improves performance on complex reasoning tasks. With Adaptive Thinking, Claude dynamically decides when and how much to think. This is the recommended mode for Opus 4.7.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2048
},
messages=[
{"role": "user", "content": "Solve this complex math problem step by step: integrate x^2 * sin(x) dx"}
]
)
The response will contain a thinking block before the final answer
print(response.content)
Key parameters:
budget_tokens: Maximum tokens Claude can use for thinking.effort: Controls thinking depth (low, medium, high).
Structured Outputs
For applications that need consistent, parseable responses, use structured outputs. Claude can return JSON, XML, or any schema you define.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Extract the key entities from this text: 'Apple acquired the startup for $500 million in 2023.'"}
],
system="Always respond in valid JSON with keys: company, amount, year, acquisition_type"
)
print(response.content[0].text)
Output: {"company": "Apple", "amount": 500000000, "year": 2023, "acquisition_type": "acquisition"}
Citations
When Claude needs to ground responses in source documents, use the Citations feature. Claude will reference exact sentences from your provided documents.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "What does the contract say about termination notice period?"}
],
documents=[
{
"type": "text",
"title": "Service Agreement",
"content": "Either party may terminate this agreement with 30 days written notice..."
}
],
system="Cite your sources using the document title and exact sentence."
)
Batch Processing
For high-volume, non-real-time workloads, use batch processing. Batch API calls cost 50% less than standard API calls.
# Create a batch of messages
batch = client.batches.create(
requests=[
{
"custom_id": "req-001",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Summarize this article..."}]
}
},
{
"custom_id": "req-002",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Translate this to French..."}]
}
}
]
)
Poll for results
import time
while batch.processing_status != "ended":
time.sleep(5)
batch = client.batches.retrieve(batch.id)
results = client.batches.results(batch.id)
for result in results:
print(result.custom_id, result.response.content[0].text)
2. Tools: Let Claude Take Action
Tools extend Claude's capabilities beyond text generation. Claude can call functions, fetch web pages, execute code, and more.
Web Fetch Tool
Claude can browse the internet to retrieve real-time information.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[{
"type": "web_fetch",
"name": "web_fetch"
}],
messages=[
{"role": "user", "content": "What's the latest news about AI regulation in the EU?"}
]
)
Code Execution Tool
Claude can write and execute Python code in a sandboxed environment.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
tools=[{
"type": "code_execution",
"name": "code_execution"
}],
messages=[
{"role": "user", "content": "Calculate the Fibonacci sequence up to 100 and plot it."}
]
)
Parallel Tool Use
Claude can call multiple tools simultaneously to speed up complex tasks.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
tools=[
{"type": "web_fetch", "name": "web_fetch"},
{"type": "code_execution", "name": "code_execution"}
],
messages=[
{"role": "user", "content": "Fetch the current stock price of AAPL and calculate its 30-day moving average."}
]
)
3. Context Management: Keep Conversations Efficient
Long-running sessions can consume many tokens. Context management features help you stay within limits and reduce costs.
Prompt Caching
Cache frequently used system prompts or document chunks to avoid reprocessing them on every request.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant specialized in Python programming.",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "How do I use async/await in Python?"}
]
)
Context Compaction
When a conversation grows too long, use compaction to summarize and reduce token usage without losing critical information.
# After many turns, compact the conversation
compacted = client.messages.compact(
messages=long_conversation,
model="claude-sonnet-4-20250514",
max_compaction_tokens=4096
)
Continue with the compacted context
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=compacted.messages + [
{"role": "user", "content": "Based on our discussion, what's the next step?"}
]
)
4. Files and Assets: Work with Documents and Images
Claude can process PDFs, images, and other file types directly.
PDF Support
import base64
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarize the key findings from this report."
}
]
}
]
)
Image and Vision
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "What trends do you see in this chart?"
}
]
}
]
)
5. Tool Infrastructure: Orchestration at Scale
For production systems, you need more than just tool definitions. Claude's tool infrastructure includes:
- Tool Runner (SDK): Automates tool discovery and execution.
- Strict Tool Use: Enforces that Claude only uses tools you explicitly allow.
- Fine-grained Tool Streaming: Stream tool calls and results token by token.
- Tool Search: Dynamically select the right tool for a given task.
- MCP (Model Context Protocol): Connect Claude to remote servers and external data sources.
Feature Availability by Platform
Not all features are available everywhere. Here's a quick reference:
| Feature | Claude API | AWS | Bedrock | Vertex AI |
|---|---|---|---|---|
| Context Windows (1M tokens) | GA | GA | GA | GA |
| Adaptive Thinking | GA | GA | GA | GA |
| Batch Processing | GA | GA | GA | GA |
| Citations | GA | GA | GA | GA |
| Prompt Caching | GA | GA | GA | GA |
| Web Fetch Tool | GA | GA | GA | Beta |
| Code Execution Tool | Beta | Beta | — | — |
| Structured Outputs | GA | GA | GA | GA |
Best Practices for Production
- Start with model capabilities – Master thinking and structured outputs before adding tools.
- Use prompt caching for system prompts and static documents to reduce latency and cost.
- Batch non-urgent requests – Save 50% on costs for tasks that don't need real-time responses.
- Monitor token usage – Use the token counting API to stay within limits.
- Handle tool errors gracefully – Always validate tool outputs before passing them back to Claude.
Key Takeaways
- Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files/assets.
- Adaptive Thinking lets Claude dynamically decide when to reason deeply – ideal for complex tasks.
- Batch processing cuts costs by 50% for asynchronous workloads.
- Prompt caching and context compaction are essential for long-running, cost-efficient sessions.
- Tools like web fetch and code execution turn Claude into an autonomous agent capable of real-world actions.
- Always check feature availability per platform (Claude API, AWS, Bedrock, Vertex AI) before building.