Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management
Explore Claude's API surface: model capabilities, tools, context management, and files. Practical guide with code examples for building production-ready AI applications.
Learn to navigate Claude's API surface—model capabilities, tools, context management, and file handling—with practical code examples and best practices for building scalable, cost-effective AI applications.
Introduction
Claude's API is more than just a text generation endpoint. It's a comprehensive platform designed to give developers fine-grained control over how Claude reasons, interacts with external systems, and manages long-running conversations. Whether you're building a simple chatbot, a complex agent, or a document analysis tool, understanding the five core areas of the API surface is essential.
This guide walks you through each area—model capabilities, tools, tool infrastructure, context management, and files/assets—with practical code examples and best practices. By the end, you'll know how to combine these features to build production-ready applications.
The Five Pillars of the Claude API
Claude's API surface is organized into five areas:
- Model capabilities – Control how Claude reasons and formats responses.
- Tools – Let Claude take actions on the web or in your environment.
- Tool infrastructure – Handle discovery and orchestration at scale.
- Context management – Keep long-running sessions efficient.
- Files and assets – Manage the documents and data you provide to Claude.
Model Capabilities: Steering Claude's Output
Model capabilities give you direct control over Claude's reasoning depth, response format, and input modalities. Here are the key features you should know.
Context Windows (Up to 1M Tokens)
Claude supports context windows of up to 1 million tokens, allowing you to process entire books, extensive code bases, or long conversation histories in a single request.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
messages=[
{"role": "user", "content": "Summarize the key themes in this 500-page novel."}
],
# The system prompt and user message together can use up to 1M tokens
system="You are an expert literary analyst."
)
print(response.content[0].text)
Best practice: Use prompt caching to reduce costs when reusing large context blocks across multiple requests.
Adaptive Thinking
Adaptive thinking lets Claude dynamically decide when and how much to "think" before responding. This is the recommended thinking mode for Opus 4.7. Use the effort parameter to control thinking depth.
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=8192,
thinking={
"type": "enabled",
"budget_tokens": 4096
},
messages=[
{"role": "user", "content": "Solve this complex math problem step by step."}
]
)
When to use: Complex reasoning tasks, multi-step problem solving, or any scenario where you want Claude to "show its work."
Structured Outputs
Claude can return responses in structured formats like JSON, making it easy to integrate with your application logic.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Extract the name, date, and amount from this invoice."}
],
system="Always respond with valid JSON. Use this schema: {\"name\": string, \"date\": string, \"amount\": number}"
)
Batch Processing (50% Cost Savings)
For high-volume workloads, use the Batch API to process requests asynchronously. Batch calls cost 50% less than standard API calls.
# Submit a batch of messages
batch = client.batches.create(
requests=[
{
"custom_id": "req-001",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Translate to French: Hello"}]
}
},
{
"custom_id": "req-002",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Translate to Spanish: Goodbye"}]
}
}
]
)
Note: Batch processing is not eligible for Zero Data Retention (ZDR).
Tools: Let Claude Take Action
Tools extend Claude's capabilities beyond text generation. Claude can call functions, fetch web pages, execute code, and even control a computer.
Defining a Tool
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g., San Francisco"
}
},
"required": ["location"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
]
)
Handling Tool Calls
When Claude decides to use a tool, the response contains a tool_use content block. You must execute the tool and return the result.
import json
After receiving the response
for content in response.content:
if content.type == "tool_use":
tool_name = content.name
tool_input = content.input
# Execute the tool (your implementation)
result = execute_tool(tool_name, tool_input)
# Send the result back to Claude
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": content.id,
"content": json.dumps(result)
}
]
}
]
)
Built-in Tools
Claude provides several pre-built tools you can enable with minimal configuration:
- Web search tool – Fetch real-time information from the web
- Code execution tool – Run Python code in a sandboxed environment
- Computer use tool – Control a virtual desktop (beta)
- Memory tool – Store and retrieve information across sessions
- Text editor tool – Edit files programmatically
Context Management: Keeping Sessions Efficient
Long-running conversations can consume significant tokens. Claude's context management features help you stay within limits and control costs.
Prompt Caching
Prompt caching lets you reuse large context blocks across multiple requests, reducing latency and cost.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a customer support agent. Here is our product manual: ...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "How do I reset my password?"}
]
)
Context Compaction
When a conversation grows too long, use context compaction to summarize earlier turns while preserving essential information.
# After many turns, compact the history
compacted = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[
{"role": "user", "content": "Summarize our conversation so far, keeping all key decisions and user preferences."}
]
)
Use the summary as the new system prompt
new_system_prompt = f"Previous conversation summary: {compacted.content[0].text}"
Files and Assets: Working with Documents
Claude can process various file types, including PDFs, images, and code files.
PDF Support
import base64
with open("document.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarize this PDF."
}
]
}
]
)
Images and Vision
Claude can analyze images for tasks like object detection, OCR, and visual reasoning.
with open("photo.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_data
}
},
{
"type": "text",
"text": "What's in this image?"
}
]
}
]
)
Feature Availability and Lifecycle
Features on the Claude Platform go through a lifecycle: Beta → Generally Available (GA) → Deprecated → Retired. Not all features pass through every stage.
- Beta – Preview features for feedback. May have limited availability and breaking changes.
- GA – Stable, fully supported, recommended for production.
- Deprecated – Still functional but not recommended. Migration path provided.
- Retired – No longer available.
Putting It All Together: A Production-Ready Agent
Here's a complete example that combines multiple features:
import anthropic
client = anthropic.Anthropic()
Define tools
tools = [
{
"name": "search_web",
"description": "Search the web for current information",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
},
{
"name": "read_pdf",
"description": "Read and extract text from a PDF file",
"input_schema": {
"type": "object",
"properties": {
"file_path": {"type": "string"}
},
"required": ["file_path"]
}
}
]
Use caching for the system prompt
system_prompt = [
{
"type": "text",
"text": "You are a research assistant. Use the available tools to answer questions accurately.",
"cache_control": {"type": "ephemeral"}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=system_prompt,
tools=tools,
messages=[
{"role": "user", "content": "Find the latest research on quantum computing and summarize it."}
]
)
print(response.content[0].text)
Key Takeaways
- Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files/assets. Start with capabilities and tools, then optimize with context management.
- Use adaptive thinking for complex reasoning and structured outputs for reliable JSON responses. Batch processing cuts costs by 50% for high-volume workloads.
- Tools extend Claude beyond text – define custom functions or use built-in tools for web search, code execution, and computer control.
- Prompt caching and context compaction are essential for managing long-running sessions efficiently and controlling token costs.
- Always check feature availability – features in Beta may have breaking changes, while GA features are safe for production use.