Mastering Claude's API: A Complete Guide to Features, Tools, and Context Management
Explore Claude's API surface: model capabilities, tools, context management, and file handling. Learn practical implementation with code examples for building powerful AI applications.
This guide walks you through Claude's five API areas—model capabilities, tools, tool infrastructure, context management, and file handling—with actionable code examples to build, optimize, and scale your AI applications.
Mastering Claude's API: A Complete Guide to Features, Tools, and Context Management
Claude's API is more than just a text generation endpoint. It's a comprehensive platform designed to give developers fine-grained control over how Claude reasons, interacts with external systems, manages long conversations, and processes files. Whether you're building a simple chatbot or a complex agentic system, understanding the full API surface is key to unlocking Claude's potential.
This guide covers the five core areas of the Claude API: model capabilities, tools, tool infrastructure, context management, and files/assets. You'll learn what each area offers, when to use it, and how to implement it with practical code examples.
---
Understanding the Five API Areas
Claude's API surface is organized into five logical areas. Each addresses a different aspect of building with AI:
| Area | Purpose |
|---|---|
| Model Capabilities | Control how Claude reasons, formats responses, and processes inputs |
| Tools | Let Claude take actions on the web or in your environment |
| Tool Infrastructure | Handle discovery and orchestration at scale |
| Context Management | Keep long-running sessions efficient |
| Files and Assets | Manage documents and data you provide to Claude |
---
1. Model Capabilities: Steering Claude's Output
Model capabilities are the foundational controls for how Claude behaves. They include reasoning depth, response format, and input modalities.
Extended Thinking with Adaptive Thinking
Claude can reason step-by-step before responding. With Adaptive Thinking, Claude dynamically decides when and how much to think. This is the recommended mode for Claude Opus 4.7. You can also control thinking depth using the effort parameter.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=1024,
thinking={
"type": "enabled",
"budget_tokens": 2048,
"effort": "high" # Controls thinking depth
},
messages=[
{"role": "user", "content": "Solve this complex math problem: integrate x^2 * sin(x) dx"}
]
)
print(response.content[0].text)
Structured Outputs
For production systems, you often need structured data. Use the structured_outputs capability to enforce JSON schemas.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Extract the name, date, and amount from this invoice: ..."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "invoice_extraction",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"date": {"type": "string"},
"amount": {"type": "number"}
},
"required": ["name", "date", "amount"]
}
}
}
)
print(response.content[0].text)
Batch Processing for Cost Savings
If you have large volumes of non-real-time requests, use Batch Processing. Batch API calls cost 50% less than standard API calls.
# Create a batch of messages
batch = client.batches.create(
requests=[
{
"custom_id": "req-001",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Summarize this article: ..."}]
}
},
{
"custom_id": "req-002",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Translate this to French: ..."}]
}
}
]
)
Later, retrieve results
results = client.batches.retrieve(batch.id)
for result in results.results:
print(result.custom_id, result.response.content[0].text)
Citations for Grounded Responses
When Claude needs to reference source documents, use Citations. Claude will provide detailed references to exact sentences in your source material.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Based on the attached PDF, what is the main finding?"}
],
documents=[
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": "<base64_encoded_pdf>"
},
"citations": {"enabled": True}
}
]
)
print(response.content[0].text)
---
2. Tools: Let Claude Take Action
Tools extend Claude's capabilities beyond text generation. Claude can call external functions, search the web, execute code, and even control a computer.
How Tool Use Works
You define tools with a name, description, and input schema. Claude decides when to call them based on the conversation context.
def get_weather(location: str) -> str:
"""Get current weather for a location."""
# Your weather API logic here
return f"Sunny, 72°F in {location}"
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
],
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
]
)
Check if Claude wants to use a tool
if response.stop_reason == "tool_use":
tool_call = response.content[-1]
if tool_call.name == "get_weather":
result = get_weather(tool_call.input["location"])
# Send result back to Claude
final_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"},
{"role": "assistant", "content": response.content},
{"role": "user", "content": [
{"type": "tool_result", "tool_use_id": tool_call.id, "content": result}
]}
]
)
print(final_response.content[0].text)
Built-in Tools
Claude provides several server-side tools you can enable without writing custom code:
- Web Search Tool: Let Claude search the internet
- Code Execution Tool: Run Python code in a sandbox
- Computer Use Tool: Claude can control a virtual desktop
- Memory Tool: Persist information across conversations
- Bash Tool: Execute shell commands
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{"type": "web_search", "name": "web_search"},
{"type": "code_execution", "name": "execute_code"}
],
messages=[
{"role": "user", "content": "Search for the latest AI news and summarize it"}
]
)
Parallel Tool Use
Claude can call multiple tools simultaneously for efficiency.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[weather_tool, stock_tool, news_tool],
parallel_tool_calls=True, # Enable parallel calls
messages=[
{"role": "user", "content": "Get the weather in London, Apple's stock price, and today's top tech news"}
]
)
---
3. Tool Infrastructure: Orchestration at Scale
When you have many tools, you need infrastructure for discovery and orchestration. Claude's API provides:
- Tool Runner (SDK): Automates the tool-use loop
- Strict Tool Use: Force Claude to use specific tools
- Tool Search: Let Claude find the right tool from a large catalog
- Fine-grained Tool Streaming: Stream tool calls token by token
4. Context Management: Keeping Sessions Efficient
Long conversations consume tokens. Claude offers several features to manage context efficiently.
Context Windows
Claude supports up to 1 million tokens of context—enough to process entire codebases or lengthy documents.
Prompt Caching
Cache repeated system prompts or document chunks to reduce latency and cost.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Hello!"}
]
)
Context Compaction
Reduce token usage by summarizing or pruning older conversation turns.
Token Counting
Estimate token usage before making API calls.
token_count = client.messages.count_tokens(
model="claude-sonnet-4-20250514",
messages=[
{"role": "user", "content": "Hello, how are you?"}
]
)
print(f"Token count: {token_count}")
---
5. Files and Assets: Working with Documents
Claude can process various file types, including PDFs, images, and code files.
PDF Support
Upload PDFs and ask Claude to extract information, summarize, or answer questions.
import base64
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarize this document in 3 bullet points."
}
]
}
]
)
print(response.content[0].text)
Images and Vision
Claude can analyze images for tasks like object detection, OCR, and visual question answering.
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{"type": "text", "text": "What does this chart show?"}
]
}
]
)
---
Feature Availability Across Platforms
Not all features are available on every platform. Here's a quick reference:
| Feature | Claude API | AWS Bedrock | Vertex AI | Microsoft Foundry |
|---|---|---|---|---|
| Context Windows (1M tokens) | GA | GA | GA | Beta |
| Adaptive Thinking | GA | GA | GA | Beta |
| Batch Processing | GA | GA | GA | GA |
| Citations | GA | GA | GA | Beta |
| Prompt Caching | GA | GA | GA | Beta |
| Web Search Tool | Beta | Beta | Beta | Beta |
| Computer Use Tool | Beta | Beta | N/A | N/A |
---
Best Practices for Building with Claude
- Start simple: Begin with model capabilities and one or two tools. Add complexity gradually.
- Use structured outputs for production systems to ensure parseable responses.
- Leverage batch processing for non-real-time workloads to save 50% on costs.
- Cache prompts that are reused across many conversations.
- Monitor token usage with the token counting endpoint to avoid surprises.
- Handle tool calls properly: Always check
stop_reasonand respond to tool calls before asking for the final answer.
Key Takeaways
- Claude's API has five core areas: model capabilities, tools, tool infrastructure, context management, and files/assets. Each serves a distinct purpose in building AI applications.
- Use Adaptive Thinking and Structured Outputs to control reasoning depth and response format for reliable, production-ready outputs.
- Batch processing cuts costs by 50%—ideal for large-scale, non-real-time workloads like data extraction or content summarization.
- Built-in tools (web search, code execution, computer use) let Claude take real-world actions without custom integration.
- Context management features like prompt caching and token counting help optimize both cost and performance in long-running sessions.