Mastering the Claude API: A Complete Guide to Features, Tools, and Infrastructure
Explore Claude's API surface including model capabilities, tools, context management, and file handling. Learn practical tips for building with Claude effectively.
This guide walks you through Claude's five core API areas: model capabilities, tools, tool infrastructure, context management, and file handling. You'll learn how to steer reasoning, use tools, manage long sessions, and optimize costs.
Introduction
Claude's API is designed to be both powerful and flexible, offering a comprehensive surface area that covers everything from basic text generation to complex agentic workflows. Whether you're building a simple chatbot or a sophisticated tool-using agent, understanding the five core areas of the API will help you get the most out of Claude.
This guide breaks down each area—model capabilities, tools, tool infrastructure, context management, and file handling—with practical advice and code examples to get you started quickly.
The Five Pillars of the Claude API
Claude's API surface is organized into five areas:
- Model capabilities: Control how Claude reasons and formats responses.
- Tools: Let Claude take actions on the web or in your environment.
- Tool infrastructure: Handles discovery and orchestration at scale.
- Context management: Keeps long-running sessions efficient.
- Files and assets: Manage the documents and data you provide to Claude.
Model Capabilities: Steering Claude's Output
Model capabilities are the core ways you influence Claude's reasoning and output. Here are the most important ones:
Extended Thinking and Adaptive Thinking
Extended thinking allows Claude to reason through complex problems before responding. With adaptive thinking, Claude dynamically decides when and how much to think—ideal for Opus 4.7. You control the depth using the effort parameter.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=1024,
thinking={
"type": "enabled",
"budget_tokens": 2048,
"effort": "high" # Options: low, medium, high
},
messages=[
{"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) dx"}
]
)
print(response.content[0].text)
Structured Outputs
For production applications, you often need Claude to return data in a specific format. Use structured outputs to enforce JSON schemas.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Extract the name, age, and email from this text: John Doe, 34, [email protected]"}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person_info",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"email": {"type": "string", "format": "email"}
},
"required": ["name", "age", "email"]
}
}
}
)
print(response.content[0].text)
Citations for Grounded Responses
Citations let Claude reference specific passages from source documents, making outputs more verifiable and trustworthy.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Summarize the key findings from the attached report."}
],
documents=[
{
"type": "text",
"title": "Q4 Report",
"content": "Revenue grew 15% in Q4..."
}
],
citations=True
)
Citations are included in the response metadata
print(response.content[0].text)
Tools: Giving Claude Actions
Tools allow Claude to interact with the outside world—fetching web pages, running code, or controlling a computer.
Built-in Tools
Claude provides several pre-built tools:
- Web search tool: Search the internet for current information.
- Web fetch tool: Retrieve content from specific URLs.
- Code execution tool: Run Python or JavaScript code in a sandbox.
- Computer use tool: Control a virtual desktop environment.
- Memory tool: Store and retrieve information across conversations.
- Bash tool: Execute shell commands.
- Text editor tool: Read and write files.
Using Tools in the API
Here's how to enable the web search tool:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"type": "web_search",
"name": "web_search"
}
],
messages=[
{"role": "user", "content": "What are the latest AI news headlines today?"}
]
)
print(response.content[0].text)
Custom Tools (Function Calling)
You can define your own tools for Claude to call:
def get_weather(location: str) -> str:
# Your weather API logic here
return f"The weather in {location} is sunny, 72°F"
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get the current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g., San Francisco"
}
},
"required": ["location"]
}
}
],
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
]
)
Handle the tool call in your application
for content in response.content:
if content.type == "tool_use":
result = get_weather(content.input["location"])
print(f"Tool result: {result}")
Tool Infrastructure: Scaling Tool Use
When you have many tools, you need infrastructure to manage them. Claude provides:
- Tool Runner (SDK): Automates tool execution and result handling.
- Strict tool use: Forces Claude to use tools exactly as defined.
- Parallel tool use: Claude can call multiple tools simultaneously.
- Fine-grained tool streaming: Stream tool calls and results in real-time.
- Tool search: Dynamically discover relevant tools for a given task.
Example: Parallel Tool Use
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[weather_tool, stock_price_tool, news_tool],
parallel_tool_calls=True,
messages=[
{"role": "user", "content": "What's the weather in London, the stock price of AAPL, and the latest tech news?"}
]
)
Claude may call all three tools in parallel
Context Management: Keeping Sessions Efficient
Long conversations can become expensive. Claude offers several features to manage context:
- Context windows: Up to 1M tokens for processing large documents.
- Compaction: Summarize and compress conversation history.
- Context editing: Remove or modify parts of the context.
- Prompt caching: Cache frequently used prompts to reduce costs.
- Token counting: Estimate token usage before sending requests.
Using Prompt Caching
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Tell me about quantum computing."}
]
)
The system prompt is cached for subsequent requests
Files and Assets: Working with Documents
Claude can process various file types:
- PDF support: Extract text and analyze documents.
- Images: Process images for vision tasks.
- Files API: Upload and manage files programmatically.
PDF Processing Example
import base64
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarize this PDF."
}
]
}
]
)
print(response.content[0].text)
Feature Availability and Lifecycle
Features on the Claude Platform go through stages:
| Classification | Description |
|---|---|
| Beta | Preview features for feedback. May change significantly. Not for production. |
| Generally Available (GA) | Stable, fully supported, recommended for production. |
| Deprecated | Still functional but not recommended. Migration path provided. |
| Retired | No longer available. |
Availability column in the documentation to see which platforms support a feature.
Best Practices for Building with Claude
- Start simple: Begin with model capabilities and tools before adding infrastructure.
- Use structured outputs: For production, enforce JSON schemas to get predictable data.
- Leverage caching: Use prompt caching for repeated system prompts to reduce costs.
- Monitor token usage: Use token counting to avoid surprises.
- Handle tool calls properly: Always implement fallback logic for tool execution failures.
Key Takeaways
- Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files/assets.
- Use extended thinking and structured outputs to control reasoning depth and response format.
- Tools let Claude interact with external systems—use built-in tools or define custom ones.
- Prompt caching and context compaction help manage costs in long-running sessions.
- Always check feature availability (Beta vs. GA) before using a feature in production.