Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management
Explore the full Claude API surface—model capabilities, tools, context management, and files. Learn how to build powerful AI applications with practical code examples.
This guide walks you through the five core areas of the Claude API: model capabilities, tools, tool infrastructure, context management, and file handling. You'll learn how to use extended thinking, structured outputs, citations, tool calling, prompt caching, and batch processing with practical Python examples.
Introduction
The Claude API offers a rich surface area for building intelligent, production-ready applications. Whether you're creating a chatbot, an agent that browses the web, or a system that processes millions of documents, understanding the five core areas of the API is essential.
This guide covers:
- Model capabilities – reasoning, structured outputs, citations
- Tools – letting Claude act on the web or in your environment
- Tool infrastructure – discovery and orchestration at scale
- Context management – keeping long-running sessions efficient
- Files and assets – managing documents and data
1. Model Capabilities: Steering Claude's Output
Claude's model capabilities let you control how it reasons and formats responses. These are the building blocks for any application.
Extended Thinking and Adaptive Thinking
For complex reasoning tasks, Claude can "think" before responding. With Extended Thinking, you set a fixed thinking budget. With Adaptive Thinking (recommended for Opus 4.7), Claude decides how much to think dynamically.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2048
},
messages=[
{"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) dx"}
]
)
Access the thinking block
for block in response.content:
if block.type == "thinking":
print("Thinking:", block.thinking)
elif block.type == "text":
print("Answer:", block.text)
Structured Outputs
Need JSON, YAML, or a specific schema? Use the structured_outputs feature to enforce response formats.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "List three planets and their moons as JSON"}
],
structured_outputs={
"type": "json_schema",
"json_schema": {
"name": "planets",
"schema": {
"type": "object",
"properties": {
"planets": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"moons": {"type": "array", "items": {"type": "string"}}
},
"required": ["name", "moons"]
}
}
},
"required": ["planets"]
}
}
}
)
print(response.content[0].text)
Citations
Ground Claude's responses in source documents. With Citations, Claude provides exact references to the source material.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Summarize the key findings from the attached report."}
],
documents=[
{
"type": "document",
"source": {
"type": "text",
"media_type": "text/plain",
"data": "Q3 revenue grew 15% year-over-year to $2.1B. Operating margin improved to 22%."
},
"title": "Q3 Earnings Report",
"context": "This is the company's quarterly earnings report.",
"citations": {"enabled": True}
}
]
)
print(response.content[0].text)
Output includes citations like [1] pointing to the source
2. Tools: Letting Claude Take Action
Tools extend Claude's capabilities beyond text generation. Claude can call functions, browse the web, execute code, and more.
Defining Tools
You define tools as JSON schemas. Claude decides when to call them.
def get_weather(location: str) -> str:
"""Get current weather for a location."""
# In production, call a real weather API
return f"The weather in {location} is sunny, 72°F."
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get the current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g., San Francisco, CA"
}
},
"required": ["location"]
}
}
],
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
]
)
Check if Claude wants to use a tool
for block in response.content:
if block.type == "tool_use":
print(f"Calling tool: {block.name}")
print(f"Arguments: {block.input}")
result = get_weather(block.input["location"])
# Send result back to Claude...
Built-in Tools
Claude comes with several built-in tools:
- Web search tool – search the internet
- Web fetch tool – fetch content from URLs
- Code execution tool – run Python code in a sandbox
- Computer use tool – control a virtual desktop
- Bash tool – run shell commands
- Memory tool – store and retrieve information across sessions
Parallel Tool Use
Claude can call multiple tools simultaneously for efficiency.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[weather_tool, stock_tool, news_tool],
parallel_tool_use=True,
messages=[
{"role": "user", "content": "What's the weather in London, the stock price of AAPL, and today's top news?"}
]
)
3. Tool Infrastructure: Discovery and Orchestration
When building complex agents, you need more than just tool definitions. The Claude API provides infrastructure for:
- Tool Runner (SDK) – automatically handles tool call loops
- Strict tool use – force Claude to use specific tools
- Tool search – let Claude discover tools dynamically
- Fine-grained tool streaming – stream tool calls and results
- Tool combinations – define workflows that chain tools together
Tool Runner Example
from anthropic import Anthropic
from anthropic.types import ToolUseBlock
client = Anthropic()
Define a simple tool
weather_tool = {
"name": "get_weather",
"description": "Get weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
Use the Tool Runner (conceptual - actual implementation may vary)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[weather_tool],
tool_choice={"type": "auto"},
messages=[
{"role": "user", "content": "What's the weather in Paris?"}
]
)
The SDK can automatically handle the tool call loop
See the Tool Runner documentation for details
4. Context Management: Keeping Sessions Efficient
Long conversations or large documents require careful context management. Claude provides several features:
Context Windows
Claude supports up to 1 million tokens of context. This allows processing entire codebases, lengthy books, or hours of conversation.
Prompt Caching
Cache frequently used context (system prompts, documents) to reduce latency and cost.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant that answers questions about our company policy.",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "What is our vacation policy?"}
]
)
Check if cache was used
print(f"Cache created: {response.model_dump().get('usage', {}).get('cache_creation_input_tokens', 0)}")
print(f"Cache read: {response.model_dump().get('usage', {}).get('cache_read_input_tokens', 0)}")
Context Compaction and Editing
For very long sessions, you can compact or edit the context to remove irrelevant information while preserving key facts.
5. Files and Assets: Working with Documents
Claude can process various file types:
PDF Support
import base64
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarize this PDF."
}
]
}
]
)
print(response.content[0].text)
Images and Vision
Claude can analyze images for visual understanding.
with open("diagram.png", "rb") as f:
img_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": img_data
}
},
{
"type": "text",
"text": "Describe this diagram."
}
]
}
]
)
print(response.content[0].text)
6. Batch Processing: Cost-Effective Scale
For large volumes of requests, use batch processing. Batch API calls cost 50% less than standard API calls.
# Create a batch of messages
batch_response = client.batches.create(
requests=[
{
"custom_id": "req-001",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Translate to French: Hello, world!"}]
}
},
{
"custom_id": "req-002",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Translate to Spanish: Hello, world!"}]
}
}
]
)
print(f"Batch ID: {batch_response.id}")
print(f"Batch status: {batch_response.processing_status}")
Feature Availability by Platform
Not all features are available everywhere. Here's a quick reference:
| Feature | Claude API | AWS Bedrock | Vertex AI |
|---|---|---|---|
| Extended Thinking | GA | GA | GA |
| Structured Outputs | GA | GA | Beta |
| Citations | GA | GA | GA |
| Prompt Caching | GA | GA | GA |
| Batch Processing | GA | GA | GA |
| Computer Use | Beta | Beta | N/A |
| Web Search | GA | GA | GA |
Key Takeaways
- Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files. Start with model capabilities and tools, then optimize with context management and batch processing.
- Use Extended Thinking for complex reasoning and Structured Outputs for reliable JSON responses. Citations ground responses in source documents.
- Leverage built-in tools (web search, code execution, computer use) to build powerful agents. Use parallel tool calls for efficiency.
- Prompt caching reduces latency and cost for repeated context. Batch processing cuts costs by 50% for large workloads.
- Check feature availability per platform before building. Some features are in beta or not available on all cloud platforms.