Mastering Claude's API: A Practical Guide to Model Capabilities, Tools, and Context Management
Learn to navigate Claude's API surface across five key areas: model capabilities, tools, tool infrastructure, context management, and file handling. Includes code examples.
This guide walks you through Claude's API surface—model capabilities, tools, context management, and file handling—with practical code examples and best practices for building production-ready applications.
Introduction
Claude's API is more than just a text generation endpoint. It's a comprehensive platform designed to give you fine-grained control over how Claude reasons, interacts with external systems, and manages long-running conversations. Whether you're building a simple chatbot or a complex agent that browses the web and executes code, understanding the five core areas of the API surface is essential.
This guide breaks down each area—model capabilities, tools, tool infrastructure, context management, and file handling—with practical code examples and best practices. By the end, you'll know exactly which features to use for your use case and how to combine them effectively.
The Five Pillars of Claude's API
Claude's API surface is organized into five areas:
- Model capabilities – Control how Claude reasons and formats responses.
- Tools – Let Claude take actions on the web or in your environment.
- Tool infrastructure – Handles discovery and orchestration at scale.
- Context management – Keeps long-running sessions efficient.
- Files and assets – Manage the documents and data you provide to Claude.
Model Capabilities: Steering Claude's Reasoning and Output
Model capabilities are the foundational layer. They let you control how Claude thinks, how much it thinks, and how it formats its responses.
Extended Thinking and Adaptive Thinking
Extended Thinking lets Claude reason step-by-step before producing a final answer. This is critical for complex math, code generation, or multi-step analysis. With Adaptive Thinking (recommended for Opus 4.7), Claude dynamically decides when and how much to think. You control the depth using the effort parameter.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2048,
"effort": "high" # Options: low, medium, high
},
messages=[
{"role": "user", "content": "Solve this: A train leaves Station A at 60 mph. Another train leaves Station B at 80 mph. They are 300 miles apart. When do they meet?"}
]
)
print(response.content[0].text)
Best practice: Use effort to balance reasoning depth against latency. For simple tasks, use "low"; for complex reasoning, use "high".
Structured Outputs
Structured outputs ensure Claude's responses follow a specific schema—ideal for extracting data, generating JSON, or populating templates.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Extract the name, date, and total from this invoice: Invoice #1234, John Doe, 2025-03-15, $450.00"}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "invoice",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"date": {"type": "string"},
"total": {"type": "number"}
},
"required": ["name", "date", "total"]
}
}
}
)
print(response.content[0].text)
Citations for Grounded Responses
Citations let Claude reference exact passages from source documents. This is invaluable for legal, medical, or research applications where verifiability is paramount.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "What are the key findings from the Q4 report?"}
],
documents=[
{
"type": "text",
"title": "Q4 Earnings Report",
"content": "Revenue grew 12% year-over-year...",
"citations": {"enabled": True}
}
]
)
print(response.content[0].text)
Tools: Letting Claude Take Action
Tools extend Claude's capabilities beyond text generation. Claude can call functions, fetch web pages, execute code, and even control a computer.
Defining a Custom Tool
You define tools using a JSON schema. Claude decides when to call them based on the conversation context.
def get_weather(location: str) -> str:
# Simulated weather lookup
return f"The weather in {location} is sunny, 72°F."
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g., San Francisco, CA"
}
},
"required": ["location"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather like in Austin, TX?"}
]
)
Check if Claude wants to use a tool
if response.stop_reason == "tool_use":
tool_call = response.content[-1]
if tool_call.name == "get_weather":
result = get_weather(tool_call.input["location"])
print(result)
Parallel Tool Use
Claude can call multiple tools in a single response, which is great for tasks that require independent lookups.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[weather_tool, stock_tool, news_tool],
parallel_tool_calls=True,
messages=[
{"role": "user", "content": "Get the weather in Tokyo, the current price of Apple stock, and today's top tech news."}
]
)
Built-in Tools
Claude provides several built-in tools you can enable with a single flag:
- Web search tool – Fetch real-time information from the web.
- Code execution tool – Run Python code in a sandboxed environment.
- Computer use tool – Let Claude control a virtual desktop (beta).
- Memory tool – Persist information across conversations.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
tools=[{"type": "web_search"}],
messages=[
{"role": "user", "content": "What are the latest AI research papers from 2025?"}
]
)
Tool Infrastructure: Discovery and Orchestration at Scale
When you have dozens or hundreds of tools, you need a way to manage them. Claude's tool infrastructure includes:
- Tool Runner (SDK) – Automates the tool call loop (invoke tool, return result, continue).
- Strict tool use – Forces Claude to use a specific tool when needed.
- Tool search – Dynamically discover relevant tools based on the user's query.
- Fine-grained tool streaming – Stream tool calls and results incrementally.
Example: Tool Runner with the SDK
from anthropic import Anthropic
client = Anthropic()
The SDK's Tool Runner handles the loop automatically
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[weather_tool, calculator_tool, database_tool],
tool_choice={"type": "auto"},
messages=[
{"role": "user", "content": "What's the average temperature in cities where our top 3 customers are located?"}
]
)
Context Management: Keeping Long Conversations Efficient
Long-running sessions can consume large context windows. Claude provides several features to manage this:
Context Windows
Claude supports up to 1 million tokens of context—enough to process entire codebases or lengthy documents. However, larger contexts increase latency and cost.
Prompt Caching
Prompt caching stores frequently used context (system prompts, few-shot examples, document chunks) so you don't have to resend them. This reduces latency by up to 90% and costs by 50%.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a legal assistant specializing in contract law.",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Review this non-disclosure agreement..."}
]
)
Context Editing and Compaction
- Context editing – Remove or modify parts of the conversation history without restarting.
- Compaction – Summarize older messages to free up tokens while preserving key information.
Files and Assets: Managing Documents and Data
Claude can process various file types, including PDFs, images, and code files.
PDF Support
import base64
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarize this report and highlight key financial metrics."
}
]
}
]
)
Image and Vision
Claude can analyze images for tasks like object detection, chart reading, and document scanning.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "What's the chart showing?"
}
]
}
]
)
Putting It All Together: A Practical Workflow
Here's a real-world example combining multiple features: a customer support agent that reads a PDF, searches the web, and responds with citations.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=[
{
"type": "text",
"text": "You are a helpful support agent. Always cite your sources.",
"cache_control": {"type": "ephemeral"}
}
],
tools=[
{"type": "web_search"},
{
"name": "get_order_status",
"description": "Get the status of a customer order",
"input_schema": {
"type": "object",
"properties": {
"order_id": {"type": "string"}
},
"required": ["order_id"]
}
}
],
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "My order #12345 is delayed. Can you check the status and find the latest shipping policy?"
}
]
}
]
)
Feature Availability and Lifecycle
Not all features are available everywhere. Claude uses a classification system:
| Classification | Description |
|---|---|
| Beta | Preview features for feedback. May have limited availability. Not for production. |
| Generally Available (GA) | Stable, fully supported, recommended for production. |
| Deprecated | Still functional but not recommended. Migration path provided. |
| Retired | No longer available. |
Key Takeaways
- Start with model capabilities and tools – They cover 80% of common use cases. Add context management and file handling as your application grows.
- Use Adaptive Thinking for complex reasoning – Let Claude decide how much to think using the
effortparameter. Start with"medium"and adjust based on results. - Leverage prompt caching for production apps – Cache system prompts and few-shot examples to reduce latency and cost significantly.
- Combine tools for powerful agents – Use parallel tool calls and built-in tools (web search, code execution) to build autonomous agents.
- Always check feature availability – Features in beta may change. Use GA features for production workloads.