Mastering Claude’s API: A Practical Guide to Model Capabilities, Tools, and Context Management
Learn how to build with Claude’s API using model capabilities, tools, context management, and files. Includes code examples, feature availability, and best practices.
This guide walks you through Claude’s five core API areas: model capabilities, tools, tool infrastructure, context management, and files. You’ll learn how to control reasoning depth, use tools, manage long sessions, and handle documents—with practical code examples.
Introduction
Claude’s API is designed to give developers fine-grained control over how the model reasons, formats responses, interacts with external systems, and manages long-running conversations. Whether you’re building a customer support bot, a code assistant, or a document analysis tool, understanding the five core areas of the API will help you build faster, cheaper, and more reliably.
This guide covers:
- Model capabilities – steering Claude’s reasoning and output format
- Tools – letting Claude take actions on the web or in your environment
- Tool infrastructure – discovery and orchestration at scale
- Context management – keeping long sessions efficient
- Files and assets – managing documents and data
1. Model Capabilities
Model capabilities control how Claude reasons and what it outputs. These are the most fundamental building blocks.
Extended Thinking & Adaptive Thinking
Claude can “think” before responding, which improves reasoning on complex tasks. With Adaptive Thinking, Claude dynamically decides when and how much to think—ideal for Opus 4.7. Use the effort parameter to control depth.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2048,
"effort": "high"
},
messages=[
{"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) dx"}
]
)
print(response.content)
Structured Outputs
Claude can return JSON or other structured formats, making it easy to parse responses programmatically.
Example: Request JSON outputresponse = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="Always respond in JSON format with keys: name, age, city",
messages=[
{"role": "user", "content": "Tell me about John, a 30-year-old from New York"}
]
)
print(response.content[0].text)
Streaming & Batch Processing
- Streaming: Get tokens as they’re generated for real-time UX.
- Batch processing: Send large volumes of requests asynchronously at 50% lower cost.
stream = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a short poem about AI"}],
stream=True
)
for event in stream:
if event.type == "content_block_delta":
print(event.delta.text, end="")
Feature Availability
| Feature | Availability |
|---|---|
| Context windows (up to 1M tokens) | GA on Claude API, Bedrock, Vertex AI |
| Adaptive thinking | GA on Claude API, Bedrock, Vertex AI |
| Batch processing | GA on Claude API, Bedrock, Vertex AI |
| Citations | GA on Claude API, Bedrock, Vertex AI |
| Structured outputs | GA on Claude API |
Note: Features marked as Beta may change or be discontinued. Always check the Claude API docs for the latest status.
2. Tools
Tools let Claude interact with external systems—web search, code execution, file operations, and more.
How Tool Use Works
- Define a tool with a name, description, and input schema.
- Claude decides whether to call the tool based on the conversation.
- Your application executes the tool and returns the result.
- Claude incorporates the result into its response.
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)
Parallel Tool Calls
Claude can call multiple tools at once, reducing latency.
Strict Tool Use
Force Claude to always use a specific tool by setting tool_choice to {"type": "tool", "name": "your_tool"}.
3. Tool Infrastructure
When you have many tools, you need discovery and orchestration. Claude’s API supports:
- Tool Runner (SDK): Automates tool execution loops.
- Server Tools: Tools hosted on remote MCP servers.
- Programmatic Tool Calling: Call tools without Claude deciding—useful for deterministic workflows.
- Fine-grained Tool Streaming: Stream tool calls and results token by token.
MCP (Model Context Protocol)
MCP lets you connect Claude to remote tools and data sources. You can use the MCP Connector to integrate with any MCP-compatible server.
4. Context Management
Long conversations can become expensive and slow. Claude provides several features to manage context efficiently.
Context Windows
Claude supports up to 1 million tokens of context. That’s enough to process entire codebases or lengthy documents.
Prompt Caching
Cache frequently used system prompts or context to reduce latency and cost. Cached content is reused across multiple requests.
Example: Enable prompt cachingresponse = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful coding assistant.",
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": "Explain Python decorators"}]
)
Context Editing & Compaction
- Context editing: Remove or modify parts of the conversation history.
- Compaction: Summarize older messages to save tokens.
Token Counting
Estimate token usage before sending a request to avoid hitting limits.
tokens = client.count_tokens("Hello, world!")
print(tokens) # Output: 3
5. Files and Assets
Claude can process various file types, including PDFs, images, and code files.
PDF Support
Claude can extract text and structure from PDFs. Use the Files API to upload documents.
Example: Upload a PDFimport base64
with open("document.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{"type": "text", "text": "Summarize this document"}
]
}
]
)
Images and Vision
Claude can analyze images. Pass them as base64-encoded data or URLs.
Best Practices
- Start with model capabilities and tools – these give you the most value quickly.
- Use prompt caching for system prompts and static context to reduce costs.
- Stream responses for better user experience.
- Use batch processing for large, non-urgent workloads to save 50%.
- Monitor feature availability – Beta features may change; GA features are safe for production.
Key Takeaways
- Claude’s API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files.
- Use Adaptive Thinking with the
effortparameter to control reasoning depth. - Prompt caching and batch processing can significantly reduce costs.
- Structured outputs and streaming improve developer experience and user experience.
- Always check feature availability (Beta vs. GA) before relying on a feature in production.