Claude API Feature Overview: A Practical Guide to Model Capabilities, Tools, and Context Management
Explore the five core areas of the Claude API surface: model capabilities, tools, context management, files, and tool infrastructure. Learn how to steer Claude's reasoning, use tools, and optimize costs with practical code examples.
This guide breaks down the Claude API into five areas: model capabilities (thinking, structured outputs), tools (web search, code execution), context management (prompt caching, compaction), files (PDF, images), and tool infrastructure (MCP, orchestration). You'll learn how to use each area with practical code examples.
Introduction
Claude's API surface is organized into five core areas: Model capabilities, Tools, Tool infrastructure, Context management, and Files and assets. Each area gives you different levers to control how Claude reasons, interacts with external systems, and handles long-running conversations. This guide walks through each area with practical code examples and explains how features map to availability (Beta, GA, or Deprecated) across platforms like Claude API, AWS Bedrock, Vertex AI, and Microsoft Foundry.
1. Model Capabilities: Steering Claude's Reasoning and Output
Model capabilities control how Claude processes input and formats responses. Key features include:
- Extended Thinking & Adaptive Thinking: Let Claude reason step-by-step before answering. With Adaptive Thinking (GA on Claude API), you can set the
effortparameter to let Claude dynamically decide how much to think. - Structured Outputs: Enforce JSON or other structured formats for machine-readable responses.
- Citations: Ground responses in source documents with exact sentence references.
- Multilingual Support: Claude works across dozens of languages.
- Zero Data Retention (ZDR): Eligible for many features, ensuring your data isn't stored.
Example: Using Adaptive Thinking with Effort Parameter
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=1024,
thinking={
"type": "enabled",
"budget_tokens": 2048,
"effort": "high" # Controls thinking depth
},
messages=[
{"role": "user", "content": "Analyze the pros and cons of using microservices vs monoliths for a startup."}
]
)
print(response.content[0].text)
Batch Processing for Cost Savings
Batch API calls cost 50% less than standard API calls. Use this for large volumes of non-urgent requests.
import anthropic
client = anthropic.Anthropic()
batch = client.batches.create(
requests=[
{
"custom_id": "req-001",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Summarize this article."}]
}
},
# Add more requests...
]
)
print(f"Batch ID: {batch.id}")
2. Tools: Let Claude Take Actions
Tools extend Claude's capabilities to interact with the outside world. The API supports:
- Web Search Tool: Let Claude search the web for real-time information.
- Code Execution Tool: Run Python code in a sandboxed environment.
- Computer Use Tool: Control a virtual desktop environment.
- Memory Tool: Store and retrieve information across conversations.
- Bash Tool: Execute shell commands.
- Text Editor Tool: Read, write, and edit files.
- Advisor Tool: Get guidance on complex tasks.
Example: Using the Web Search Tool
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"type": "web_search",
"name": "web_search",
"description": "Search the web for current information."
}
],
messages=[
{"role": "user", "content": "What is the latest news about Claude AI?"]
]
)
Claude will automatically decide when to call the tool
print(response.content[0].text)
Tool Use Best Practices
- Parallel Tool Use: Claude can call multiple tools simultaneously for efficiency.
- Strict Tool Use: Force Claude to use a specific tool when needed.
- Tool Runner (SDK): Automate tool execution with built-in SDK helpers.
- Fine-grained Tool Streaming: Stream tool calls and results incrementally.
3. Tool Infrastructure: Discovery and Orchestration at Scale
When you have many tools, you need infrastructure to manage them. Claude supports:
- MCP (Model Context Protocol): A standard for connecting Claude to external tools and data sources.
- Remote MCP Servers: Connect to tools hosted on remote servers.
- MCP Connector: Bridge between Claude and your existing tool ecosystem.
- Tool Search: Let Claude discover the right tool from a large catalog.
- Tool Combinations: Chain multiple tools together for complex workflows.
- Programmatic Tool Calling: Call tools directly from your code without Claude deciding.
Example: Setting Up a Remote MCP Server
import { MCPClient } from "@anthropic-ai/sdk";
const client = new MCPClient({
apiKey: process.env.ANTHROPIC_API_KEY,
});
// Connect to a remote MCP server
const mcpServer = await client.mcp.connect({
url: "https://my-mcp-server.example.com",
auth: {
type: "bearer",
token: process.env.MCP_SERVER_TOKEN,
},
});
// Use tools from the MCP server
const result = await mcpServer.useTool({
name: "database_query",
input: { query: "SELECT * FROM users LIMIT 10" },
});
console.log(result);
4. Context Management: Keeping Long Sessions Efficient
Long conversations can consume many tokens. Context management features help:
- Context Windows: Up to 1M tokens for processing large documents and codebases.
- Compaction: Reduce token usage by summarizing or pruning older messages.
- Context Editing: Manually remove or modify parts of the conversation history.
- Prompt Caching: Reuse cached prompts across requests to reduce latency and cost.
- Token Counting: Estimate token usage before sending a request.
Example: Using Prompt Caching
import anthropic
client = anthropic.Anthropic()
Cache a system prompt for reuse
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=512,
system=[
{
"type": "text",
"text": "You are a helpful assistant specialized in Python programming.",
"cache_control": {"type": "ephemeral"} # Cache this prompt
}
],
messages=[
{"role": "user", "content": "Explain list comprehensions."}
]
)
print(response.content[0].text)
5. Files and Assets: Managing Documents and Data
Claude can process various file types:
- PDF Support: Extract text and layout from PDFs.
- Images and Vision: Analyze images with multimodal models.
- Files API: Upload and reference files in conversations.
Example: Processing a PDF with Citations
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": "<base64-encoded-pdf>"
}
},
{
"type": "text",
"text": "Summarize this document and cite key statistics."
}
]
}
]
)
print(response.content[0].text)
Feature Availability Across Platforms
Not all features are available everywhere. Here's a quick reference:
| Feature | Claude API | AWS Bedrock | Vertex AI | Microsoft Foundry |
|---|---|---|---|---|
| Context Windows (1M tokens) | GA | GA | GA | Beta |
| Adaptive Thinking | GA | GA | GA | Beta |
| Batch Processing | GA | GA | GA | GA |
| Citations | GA | GA | GA | Beta |
| Web Search Tool | Beta | Beta | Beta | Beta |
| Code Execution Tool | Beta | Beta | Beta | Beta |
| Prompt Caching | GA | GA | GA | Beta |
| Structured Outputs | GA | GA | GA | GA |
Key Takeaways
- Claude's API is organized into five areas: Model capabilities, tools, tool infrastructure, context management, and files. Start with model capabilities and tools, then explore the others for optimization.
- Use Adaptive Thinking for complex reasoning: Set the
effortparameter to control thinking depth without manual tuning. - Batch processing cuts costs by 50%: Use the Batch API for large, non-urgent workloads.
- Leverage tools for real-world actions: Web search, code execution, and memory tools let Claude interact with external systems.
- Manage context efficiently: Use prompt caching, compaction, and context editing to keep long sessions fast and cost-effective.