Mastering Claude’s API: A Practical Guide to Model Capabilities, Tools, and Context Management
Learn how to build with Claude's API using model capabilities, tools, context management, and file handling. Includes code examples and best practices for production.
This guide walks you through Claude’s five API surface areas—model capabilities, tools, tool infrastructure, context management, and files—with practical code examples and best practices for building reliable, scalable AI applications.
Mastering Claude’s API: A Practical Guide to Model Capabilities, Tools, and Context Management
Claude’s API is designed to give developers fine-grained control over how the model reasons, interacts with external systems, and manages long-running conversations. Whether you’re building a customer support bot, a code assistant, or a document analysis tool, understanding the five core areas of the API surface will help you build faster, cheaper, and more reliably.
This guide covers each area with practical code examples and best practices. By the end, you’ll know how to choose the right features for your use case and how to combine them effectively.
The Five Pillars of the Claude API
Claude’s API surface is organized into five areas:
- Model capabilities – Control how Claude reasons and formats responses.
- Tools – Let Claude take actions on the web or in your environment.
- Tool infrastructure – Handle discovery and orchestration at scale.
- Context management – Keep long-running sessions efficient.
- Files and assets – Manage documents and data you provide to Claude.
1. Model Capabilities: Steering Claude’s Reasoning and Output
Model capabilities control how Claude thinks and what it produces. The most important capabilities for most developers are:
- Extended thinking – Enables step-by-step reasoning for complex tasks.
- Adaptive thinking – Lets Claude dynamically decide when and how much to think (recommended for Opus 4.7).
- Structured outputs – Enforces a specific JSON schema for responses.
- Citations – Grounds responses in source documents with exact references.
Example: Using Structured Outputs
Structured outputs are critical when you need Claude to return data in a machine-readable format. Here’s how to enforce a JSON schema:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=1024,
system="You are a data extraction assistant. Always respond with valid JSON.",
messages=[
{"role": "user", "content": "Extract the name, date, and amount from this invoice: Invoice #1234, dated 2025-03-15, for $450.00."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "invoice_extraction",
"schema": {
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string"},
"amount": {"type": "number"}
},
"required": ["invoice_number", "date", "amount"]
}
}
}
)
print(response.content[0].text)
Adaptive Thinking with the effort Parameter
For Opus 4.7, the recommended thinking mode is adaptive thinking. You control the depth using the effort parameter:
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=2048,
thinking={"type": "enabled", "budget_tokens": 1024, "effort": "high"},
messages=[
{"role": "user", "content": "Design a distributed caching system that handles cache invalidation across 100 nodes."}
]
)
Best practice: Use effort: "medium" for most tasks and increase to "high" only for complex multi-step reasoning.
2. Tools: Let Claude Act on Your Behalf
Tools extend Claude’s capabilities beyond text generation. The API supports several built-in tools:
- Web search tool – Fetch real-time information from the web.
- Web fetch tool – Retrieve content from a specific URL.
- Code execution tool – Run Python or JavaScript code in a sandbox.
- Memory tool – Store and retrieve information across sessions.
- Bash tool – Execute shell commands (use with caution).
- Computer use tool – Control a virtual desktop environment (beta).
Example: Using the Web Search Tool
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"type": "web_search",
"name": "web_search",
"description": "Search the web for current information."
}
],
messages=[
{"role": "user", "content": "What are the latest AI research papers from May 2025?"}
]
)
Programmatic Tool Calling
For advanced workflows, you can bypass Claude’s automatic tool selection and call tools programmatically:
tool_call = {
"type": "tool_use",
"name": "web_search",
"input": {"query": "Claude API batch processing 2025"}
}
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[...],
messages=[
{"role": "user", "content": "Find information about batch processing."},
{"role": "assistant", "content": [tool_call]}
]
)
3. Tool Infrastructure: Discovery and Orchestration at Scale
When you have many tools, you need a way to manage them efficiently. Claude’s tool infrastructure includes:
- Tool search – Dynamically find the right tool for a given task.
- Fine-grained tool streaming – Stream tool calls and results incrementally.
- MCP (Model Context Protocol) connector – Connect to remote MCP servers for standardized tool access.
Example: Using Tool Search
tools = [
{"type": "custom", "name": "get_weather", "description": "Get current weather for a city."},
{"type": "custom", "name": "get_stock_price", "description": "Get current stock price for a ticker."},
{"type": "custom", "name": "send_email", "description": "Send an email to a recipient."}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
tool_choice={"type": "auto", "disable_search": False},
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
]
)
Claude automatically selects get_weather without searching all tools
4. Context Management: Keeping Long Sessions Efficient
Long conversations can become expensive and slow. Claude provides three mechanisms to manage context:
- Context windows – Up to 1M tokens for processing large documents.
- Compaction – Summarize and compress older conversation turns.
- Context editing – Manually remove or rewrite parts of the conversation history.
- Prompt caching – Cache repeated system prompts or large documents to reduce latency and cost.
Example: Using Prompt Caching
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant with knowledge of our product documentation.",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "How do I reset my password?"}
]
)
Best practice: Cache any content that is reused across multiple requests, such as system prompts, knowledge base excerpts, or conversation templates.
5. Files and Assets: Working with Documents and Data
Claude can process a variety of file types:
- PDF support – Extract text and layout from PDFs.
- Images and vision – Analyze images for content, charts, or diagrams.
- Files API – Upload and reference files in conversations.
Example: Analyzing a PDF with Citations
import base64
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarize the key findings from this report and cite the relevant sections."
}
]
}
],
citations={"enabled": True}
)
for citation in response.citations:
print(f"Cited: {citation.document_title} - {citation.start_page}:{citation.start_line}")
Feature Availability and Lifecycle
Not all features are available on every platform. Claude uses the following classifications:
| Classification | Description |
|---|---|
| Beta | Preview features for feedback. May change significantly. Not for production. |
| Generally Available (GA) | Stable, fully supported, recommended for production. |
| Deprecated | Still functional but no longer recommended. Migration path provided. |
| Retired | No longer available. |
Putting It All Together: A Production-Ready Example
Here’s a complete example that combines structured outputs, tool use, and prompt caching:
import anthropic
client = anthropic.Anthropic()
Step 1: Define tools
tools = [
{
"type": "web_search",
"name": "web_search",
"description": "Search the web for current information."
}
]
Step 2: Use prompt caching for the system prompt
system_prompt = [
{
"type": "text",
"text": "You are a research assistant. Always cite sources and return structured JSON.",
"cache_control": {"type": "ephemeral"}
}
]
Step 3: Send a request with structured output
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=system_prompt,
tools=tools,
messages=[
{"role": "user", "content": "Find the latest news about AI regulation in the EU and summarize it in JSON with keys: date, source, summary."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "news_summary",
"schema": {
"type": "object",
"properties": {
"date": {"type": "string"},
"source": {"type": "string"},
"summary": {"type": "string"}
},
"required": ["date", "source", "summary"]
}
}
}
)
print(response.content[0].text)
Key Takeaways
- Start with model capabilities and tools – They cover 80% of common use cases. Add context management and file handling as your needs grow.
- Use structured outputs for production – Enforcing a JSON schema reduces parsing errors and makes your integration more reliable.
- Cache aggressively – Prompt caching can reduce latency by 50% or more for repeated content. Cache system prompts, knowledge bases, and conversation templates.
- Choose the right thinking mode – Adaptive thinking with the
effortparameter gives you fine-grained control over reasoning depth without wasting tokens. - Check feature availability per platform – Not all features are GA everywhere. Always verify before building a production dependency.