Building with Claude: A Practical Guide to the API Feature Surface
Explore Claude's five core API areas: model capabilities, tools, tool infrastructure, context management, and files. Learn how to steer reasoning, integrate tools, and optimize production workflows.
This guide walks you through Claude's API feature surface—model capabilities, tools, tool infrastructure, context management, and files—so you can build reliable, scalable applications with Claude.
Introduction
Claude's API is not just a single endpoint—it's a rich ecosystem of features designed to give you fine-grained control over how Claude reasons, acts, and interacts with your data. Whether you're building a customer support bot, a code assistant, or a document analysis pipeline, understanding the five core areas of Claude's API surface will help you ship faster and scale smarter.
This guide breaks down each area with practical examples and actionable advice. By the end, you'll know exactly which features to reach for and when.
The Five Pillars of the Claude API
Claude's API surface is organized into five areas:
- Model capabilities – Control how Claude reasons and formats responses.
- Tools – Let Claude take actions on the web or in your environment.
- Tool infrastructure – Handle discovery and orchestration at scale.
- Context management – Keep long-running sessions efficient.
- Files and assets – Manage the documents and data you provide to Claude.
1. Model Capabilities: Steering Claude's Reasoning and Output
Model capabilities are the most direct way to influence Claude's behavior. They include response format, reasoning depth, and input modalities.
Extended Thinking and Adaptive Thinking
Extended Thinking lets Claude reason step-by-step before producing a final answer. This is ideal for complex math, multi-step logic, or code generation.
Adaptive Thinking (recommended for Opus 4.7) lets Claude dynamically decide when and how much to think. You control the depth using theeffort parameter.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2048,
"effort": "high" # Controls thinking depth
},
messages=[
{"role": "user", "content": "Solve this differential equation: dy/dx = y * sin(x)"}
]
)
print(response.content[0].thinking) # The reasoning chain
print(response.content[1].text) # The final answer
Structured Outputs
Structured outputs force Claude to return responses in a specific format—JSON, for example. This is critical for programmatic consumption.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Extract the name, date, and total from this invoice: ..."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "invoice",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"date": {"type": "string"},
"total": {"type": "number"}
},
"required": ["name", "date", "total"]
}
}
}
)
print(response.content[0].text)
Batch Processing
Batch processing lets you send large volumes of requests asynchronously at 50% lower cost than standard API calls. Use it for offline data enrichment, bulk classification, or nightly report generation.
# Create a batch of messages
batch = client.batches.create(
requests=[
{
"custom_id": "req-001",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Summarize: ..."}]
}
},
# ... more requests
]
)
Poll for results
result = client.batches.retrieve(batch.id)
Note: Batch processing is not eligible for Zero Data Retention (ZDR).
2. Tools: Let Claude Take Action
Tools are how Claude interacts with the outside world—your database, a web API, or even the user's file system.
How Tool Use Works
You define tools as JSON schemas. Claude decides when to call them based on the conversation context. The API returns a tool_use stop reason, and you execute the tool, then return the result.
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
]
)
Check if Claude wants to use a tool
if response.stop_reason == "tool_use":
tool_call = response.content[-1]
print(f"Calling {tool_call.name} with {tool_call.input}")
Parallel Tool Use
Claude can call multiple tools in a single turn, which is great for gathering independent data points simultaneously.
# Claude might call get_weather for Tokyo and London in one response
The API returns multiple tool_use blocks
Strict Tool Use
Strict tool use forces Claude to call a specific tool every time—useful for routing or guardrails.
3. Tool Infrastructure: Discovery and Orchestration at Scale
When you have dozens or hundreds of tools, you need infrastructure to manage them. Claude's platform provides:
- Tool search – Let Claude discover relevant tools dynamically.
- Tool combinations – Chain tools together (e.g., search → fetch → summarize).
- Programmatic tool calling – Bypass Claude's decision-making and call tools directly.
- Fine-grained tool streaming – Stream tool calls and results token by token for real-time UX.
Server Tools (MCP)
Model Context Protocol (MCP) servers let you expose external services as tools. Claude can connect to remote MCP servers for database queries, API calls, or custom business logic.
# Configure a remote MCP server
mcp_server = {
"url": "https://my-mcp-server.example.com",
"headers": {"Authorization": "Bearer my-token"}
}
response = client.messages.create(
model="claude-sonnet-4-20250514",
tools=[{"type": "mcp", "server": mcp_server}],
messages=[{"role": "user", "content": "Find all orders from last week"}]
)
4. Context Management: Keeping Sessions Efficient
Long conversations can consume large context windows. Claude provides tools to manage this.
Context Windows
Claude supports up to 1 million tokens of context—enough to process entire codebases or book-length documents. But bigger context means higher cost and latency.
Prompt Caching
Prompt caching lets you reuse common prefixes (system prompts, tool definitions) across multiple requests, reducing both cost and latency.
response = client.messages.create(
model="claude-sonnet-4-20250514",
system=[
{
"type": "text",
"text": "You are a helpful assistant...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": "Hello"}]
)
Context Editing and Compaction
- Context editing – Remove or modify parts of the conversation history.
- Compaction – Summarize older turns to fit within context limits.
5. Files and Assets: Working with Documents and Data
Claude can process files directly—PDFs, images, code files, and more.
PDF Support
Claude can extract text and layout from PDFs, making it ideal for document analysis.
import base64
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{"type": "text", "text": "Summarize this report"}
]
}
]
)
Images and Vision
Claude can analyze images for tasks like OCR, object detection, or visual Q&A.
Feature Availability: Understanding the Lifecycle
Not all features are created equal. Claude's platform uses these classifications:
| Classification | Description |
|---|---|
| Beta | Preview features for feedback. May change or be discontinued. Not for production. |
| Generally Available (GA) | Stable, fully supported, production-ready. |
| Deprecated | Still functional but not recommended. Migration path provided. |
| Retired | No longer available. |
Availability column in the docs before building a production dependency on a feature.
Putting It All Together: A Production-Ready Pattern
Here's a pattern that combines multiple features for a robust application:
import anthropic
client = anthropic.Anthropic()
1. Cache your system prompt
system_prompt = {
"type": "text",
"text": "You are a support agent. Use tools to look up orders and return structured JSON.",
"cache_control": {"type": "ephemeral"}
}
2. Define tools
tools = [
{
"name": "lookup_order",
"description": "Look up an order by ID",
"input_schema": {
"type": "object",
"properties": {"order_id": {"type": "string"}},
"required": ["order_id"]
}
}
]
3. Use structured output and extended thinking
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system=[system_prompt],
tools=tools,
thinking={"type": "enabled", "budget_tokens": 1024},
response_format={
"type": "json_schema",
"json_schema": {
"name": "support_response",
"schema": {
"type": "object",
"properties": {
"summary": {"type": "string"},
"action_taken": {"type": "string"},
"order_status": {"type": "string"}
},
"required": ["summary", "action_taken", "order_status"]
}
}
},
messages=[
{"role": "user", "content": "My order #12345 hasn't arrived"}
]
)
print(response.content)
This pattern combines prompt caching (cost savings), tool use (action), extended thinking (reasoning), and structured outputs (reliability).
Key Takeaways
- Start with model capabilities and tools – they give you the most control over Claude's behavior and output.
- Use structured outputs and extended thinking for production apps that need reliable, well-reasoned responses.
- Leverage prompt caching and batch processing to reduce costs—batch calls are 50% cheaper.
- Check feature availability before building dependencies; beta features may change without notice.
- Combine features strategically – caching + tools + structured outputs + thinking creates a powerful, production-ready stack.