Navigating the Claude API: A Practical Guide to Features, Tools, and Context Management
Explore the five core areas of the Claude API: model capabilities, tools, context management, files, and infrastructure. Learn how to steer reasoning, use tools, and optimize costs with practical examples.
This guide breaks down the Claude API into five actionable areas: model capabilities (thinking, structured outputs), tools (web fetch, code execution), context management (prompt caching, compaction), files (PDF support), and infrastructure (tool runner, MCP). You'll learn how to combine them for production-ready apps.
Navigating the Claude API: A Practical Guide to Features, Tools, and Context Management
The Claude API offers a rich surface area that goes far beyond simple text generation. Whether you're building a customer support agent, a code assistant, or a document analysis tool, understanding how the API's five core areas work together is essential for creating efficient, scalable applications.
This guide walks you through each area—model capabilities, tools, context management, files and assets, and tool infrastructure—with practical code examples and best practices. By the end, you'll know how to steer Claude's reasoning, let it interact with external systems, manage long conversations, and optimize costs.
---
1. Model Capabilities: Steering Claude’s Reasoning and Output
Model capabilities control how Claude thinks and what it returns. The API exposes several powerful features:
Extended Thinking with Adaptive Thinking
Claude can now dynamically decide when to "think" more deeply. With Adaptive Thinking, you set an effort parameter (low, medium, high) and Claude allocates reasoning tokens accordingly. This is ideal for complex math, logic puzzles, or multi-step planning.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={"type": "enabled", "budget_tokens": 2048},
messages=[
{"role": "user", "content": "Solve this: A bat and a ball cost $1.10. The bat costs $1.00 more than the ball. How much does the ball cost?"}
]
)
The response includes a 'thinking' block
print(response.content[0].thinking)
Tip: Useeffortinstead ofbudget_tokenswhen you want Claude to decide how much to think. For example:thinking={"type": "enabled", "effort": "high"}.
Structured Outputs
When you need JSON, code, or a specific schema, use structured outputs to enforce the format. This eliminates parsing errors.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "List three planets and their distance from the sun."}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "planets",
"schema": {
"type": "object",
"properties": {
"planets": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"distance_au": {"type": "number"}
},
"required": ["name", "distance_au"]
}
}
},
"required": ["planets"]
}
}
}
)
print(response.content[0].text)
Citations for Grounding
If your app needs to cite sources (e.g., legal documents, research papers), use the Citations feature. Claude will return inline citations pointing to specific sentences in your source documents.
---
2. Tools: Let Claude Take Action
Tools extend Claude's capabilities beyond text. You can give it access to web search, code execution, file operations, and more.
Built-in Tools
Anthropic provides several first-party tools:
| Tool | Use Case |
|---|---|
| Web fetch tool | Retrieve content from URLs |
| Code execution tool | Run Python/JavaScript in a sandbox |
| Bash tool | Execute shell commands |
| Computer use tool | Control a virtual desktop (beta) |
| Memory tool | Store and recall information across sessions |
Example: Using the Web Fetch Tool
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
tools=[
{
"type": "web_fetch",
"name": "fetch_webpage",
"description": "Fetch the content of a URL"
}
],
messages=[
{"role": "user", "content": "What's the latest news from the Claude API docs? Fetch https://docs.anthropic.com/en/docs"}
]
)
Parallel Tool Use
Claude can call multiple tools in a single turn. For example, it might fetch two web pages simultaneously to compare data. Enable this by setting parallel_tool_calls=True in your request.
---
3. Context Management: Keeping Long Sessions Efficient
When you have long conversations or large documents, context management becomes critical. Claude supports up to 1M tokens of context, but you need to manage costs and latency.
Prompt Caching
Cache repeated system prompts or large reference documents. Cached content is reused across requests, reducing cost and latency.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant with knowledge of the entire Python standard library.",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Explain the os.path module."}
]
)
Cost tip: Prompt caching can reduce costs by up to 50% for repeated system prompts or large context blocks.
Context Compaction
When a conversation grows too long, use context compaction to summarize earlier turns while preserving key information. This keeps the context window manageable.
Token Counting
Always count tokens before sending a request to avoid hitting limits unexpectedly.
token_count = client.messages.count_tokens(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello, world!"}]
)
print(token_count.input_tokens) # e.g., 3
---
4. Files and Assets: Working with Documents
Claude can process a variety of file types, including PDFs, images, and text files.
PDF Support
Upload PDFs directly and Claude will extract text, tables, and even layout information.
import base64
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarize this report."
}
]
}
]
)
Images and Vision
Claude can analyze images (JPEG, PNG, GIF, WebP) for tasks like object detection, OCR, or chart reading.
---
5. Tool Infrastructure: Orchestration at Scale
When you have many tools or complex workflows, the tool infrastructure layer helps with discovery, routing, and orchestration.
Tool Runner (SDK)
The Tool Runner is an SDK component that automatically handles tool call loops. Instead of manually parsing tool calls and sending results back, you define tools and let the runner manage the cycle.
MCP (Model Context Protocol)
MCP allows you to connect Claude to remote servers, databases, or APIs. You can define MCP servers that expose tools, and Claude can discover and call them dynamically.
# Example: Connecting to a remote MCP server
(pseudo-code for illustration)
from anthropic import Anthropic
client = Anthropic()
The MCP connector handles authentication and routing
response = client.messages.create(
model="claude-sonnet-4-20250514",
tools=[{"type": "mcp", "server_url": "https://my-mcp-server.com/tools"}],
messages=[{"role": "user", "content": "Query my database for recent orders."}]
)
Batch Processing for Cost Savings
If you have large volumes of requests (e.g., processing thousands of support tickets), use Batch Processing. Batch API calls cost 50% less than standard API calls.
# Create a batch of messages
batch = client.batches.create(
requests=[
{
"custom_id": "req-001",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Summarize: ..."}]
}
},
# ... more requests
]
)
---
Putting It All Together: A Practical Workflow
Here's a realistic example combining multiple features:
- User uploads a PDF (Files API)
- Claude reads the PDF and extracts key data
- Claude uses the web fetch tool to verify facts against a live source
- Claude cites the source using the Citations feature
- The result is cached via Prompt Caching for future similar queries
- The entire conversation is compacted after 50 turns to stay within context limits
# Simplified workflow
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=[
{
"type": "text",
"text": "You are a research assistant. Always cite sources.",
"cache_control": {"type": "ephemeral"}
}
],
tools=[
{"type": "web_fetch", "name": "fetch"},
{"type": "code_execution", "name": "run_code"}
],
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_base64
}
},
{
"type": "text",
"text": "Analyze this PDF and fetch the latest stock price for the mentioned company."
}
]
}
]
)
---
Key Takeaways
- Model capabilities (thinking, structured outputs, citations) let you control Claude's reasoning and output format precisely.
- Tools extend Claude into the real world—web fetch, code execution, and memory are the most commonly used.
- Context management (prompt caching, compaction, token counting) is essential for keeping long-running sessions cost-effective and responsive.
- Batch processing offers a 50% cost reduction for non-real-time workloads.
- Tool infrastructure (MCP, Tool Runner) helps you scale from a single tool to a complex ecosystem of services.