Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management
Explore the full Claude API surface: model capabilities, tools, context management, and file handling. Learn practical implementation with code examples and best practices.
This guide walks you through the five core areas of the Claude API: model capabilities (thinking, structured outputs), tools (web fetch, code execution), context management (prompt caching, compaction), file handling (PDF, images), and batch processing. You'll learn how to combine these features for production-ready applications.
Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management
Claude's API is more than just a text-in, text-out interface. It's a rich ecosystem of capabilities designed to handle complex reasoning, tool orchestration, long-running conversations, and multimodal inputs. Whether you're building a customer support agent, a code assistant, or a document analysis pipeline, understanding the full API surface is key to unlocking Claude's potential.
This guide covers the five core areas of the Claude API:
- Model capabilities – reasoning depth, structured outputs, streaming
- Tools – letting Claude act on the web or in your environment
- Tool infrastructure – discovery and orchestration at scale
- Context management – keeping long sessions efficient
- Files and assets – managing documents and data
---
1. Model Capabilities: Steering Claude's Output
Claude's model capabilities let you control how it reasons and what it returns. These are the building blocks for any application.
Extended Thinking & Adaptive Thinking
For complex tasks like math proofs, code generation, or multi-step reasoning, Claude can "think" before responding. The Extended Thinking feature allocates internal tokens for reasoning, improving accuracy on hard problems.
Adaptive Thinking (recommended for Opus 4.5+) lets Claude decide dynamically how much to think. Use theeffort parameter to control depth:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2048,
"effort": "high" # low, medium, high, or adaptive
},
messages=[{"role": "user", "content": "Prove that the square root of 2 is irrational."}]
)
print(response.content[0].thinking) # The reasoning chain
print(response.content[1].text) # The final answer
Tip: Use effort: "adaptive" for Opus 4.5+ to let Claude decide the thinking depth automatically. This saves tokens on simple queries and allocates more for complex ones.
Structured Outputs
Claude can return responses in a structured format (JSON, XML, or custom schemas). This is critical for programmatic consumption:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a data extraction assistant. Always respond in valid JSON.",
messages=[{
"role": "user",
"content": "Extract the name, date, and total amount from this invoice: Invoice #1234, dated 2025-03-15, amount $2,450.00"
}]
)
Streaming & Refusals
Streaming lets you receive tokens as they're generated, reducing perceived latency. Streaming refusals allow you to detect content policy violations mid-stream.stream = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
stream=True,
messages=[{"role": "user", "content": "Write a short poem about AI."}]
)
for event in stream:
if event.type == "content_block_delta":
print(event.delta.text, end="")
elif event.type == "refusal":
print(f"\n[Refusal detected]: {event.refusal.reason}")
Batch Processing
For high-volume, non-real-time tasks, the Batch API offers 50% cost savings. Send up to 10,000 queries per batch:
batch = client.batches.create(
requests=[
{
"custom_id": "req-001",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Summarize: ..."}]
}
},
# ... more requests
]
)
Poll for completion
result = client.batches.retrieve(batch.id)
Note: Batch processing is not ZDR eligible – data may be retained for processing. Use standard API for sensitive data.
---
2. Tools: Letting Claude Act in the World
Tools extend Claude's capabilities beyond text. Claude can call functions, fetch web pages, execute code, and even control a computer.
Web Search & Web Fetch
Claude can search the web or fetch specific URLs to ground responses in real-time information:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
tools=[{
"type": "web_search",
"name": "web_search"
}],
messages=[{"role": "user", "content": "What's the latest news about Claude 4?"}]
)
Code Execution Tool
Claude can write and execute Python code in a sandboxed environment. Perfect for data analysis, calculations, or generating visualizations:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[{
"type": "code_execution",
"name": "execute_python"
}],
messages=[{"role": "user", "content": "Calculate the Fibonacci sequence up to 100 and plot it."}]
)
Computer Use (Beta)
For advanced automation, Claude can control a virtual desktop environment – clicking buttons, typing text, and navigating UIs. This is ideal for testing or legacy system integration.
Parallel Tool Use
Claude can call multiple tools simultaneously to speed up workflows:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[web_search_tool, code_execution_tool],
parallel_tool_calls=True,
messages=[{"role": "user", "content": "Find today's stock prices for AAPL and TSLA, then calculate their P/E ratios."}]
)
---
3. Tool Infrastructure: Discovery & Orchestration
When you have many tools, managing them becomes a challenge. Claude's tool infrastructure handles:
- Tool search – automatically find the right tool for a task
- Tool combinations – chain multiple tools together
- Fine-grained tool streaming – stream results from each tool independently
- Programmatic tool calling – call tools from your own code without Claude
Strict Tool Use
For safety-critical applications, enable strict tool use to prevent Claude from deviating from defined tools:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
tools=[...],
tool_choice={"type": "any"}, # Claude must use a tool
strict=True,
messages=[...]
)
---
4. Context Management: Keeping Sessions Efficient
Long conversations consume tokens. Claude provides several mechanisms to manage context windows efficiently.
Context Windows
Claude supports up to 1M tokens of context – enough to process entire codebases or lengthy documents. But bigger contexts cost more. Use context compaction to summarize older turns:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="Compress the conversation history into a concise summary, preserving key facts and decisions.",
messages=[
{"role": "user", "content": "Here is the full conversation log..."}
]
)
Prompt Caching
Prompt caching reduces latency and cost by reusing common prefixes (system prompts, few-shot examples) across multiple requests:response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant specialized in Python programming.",
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": "Explain decorators."}]
)
Tip: Cache the system prompt and the first few user messages for maximum savings. Cache hits reduce latency by up to 80%.
Context Editing
For interactive applications, you can edit the context window – insert, delete, or replace messages without rebuilding the entire history.
---
5. Files and Assets: Working with Documents
Claude supports multiple input modalities:
- PDF support – extract text, tables, and layout
- Images – analyze diagrams, screenshots, or photos
- Files API – upload and reference documents
import base64
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarize the key findings from this report."
}
]
}]
)
Citations
Claude can cite exact sentences from source documents, making it ideal for legal, academic, or compliance use cases:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
citations=True,
messages=[
{"role": "user", "content": "What does the contract say about termination clauses?"}
]
)
for citation in response.citations:
print(f"Source: {citation.document_title}, Page {citation.page_number}")
print(f"Quote: {citation.quoted_text}")
---
Feature Availability at a Glance
Not all features are available everywhere. Here's a quick reference:
| Feature | Claude API | AWS | Bedrock | Vertex AI |
|---|---|---|---|---|
| Extended Thinking | GA | GA | GA | GA |
| Batch Processing | GA | GA | GA | GA |
| Prompt Caching | GA | GA | GA | GA |
| Computer Use | Beta | Beta | Beta | Beta |
| Citations | GA | GA | GA | GA |
| Code Execution | GA | GA | GA | GA |
Beta features may change significantly. Use GA features for production workloads.
---
Putting It All Together: A Practical Example
Let's build a research assistant that searches the web, reads a PDF, and generates a structured report:
import anthropic
import base64
client = anthropic.Anthropic()
Load PDF
with open("research_paper.pdf", "rb") as f:
pdf_b64 = base64.b64encode(f.read()).decode()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a research assistant. Search the web for recent developments, then analyze the provided PDF.",
tools=[
{"type": "web_search", "name": "web_search"},
{"type": "code_execution", "name": "execute_python"}
],
citations=True,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_b64
}
},
{
"type": "text",
"text": "Search for the latest research on this topic, then summarize the PDF and the search results in a structured JSON report."
}
]
}
]
)
print(response.content[0].text)
---
Key Takeaways
- Claude's API is organized into five pillars: model capabilities, tools, tool infrastructure, context management, and file handling. Master each to build sophisticated applications.
- Use Extended Thinking for complex reasoning and Adaptive Thinking for Opus 4.5+ to save tokens on simple tasks.
- Leverage tools like web search and code execution to give Claude real-world agency, but use strict tool mode for safety.
- Prompt caching and context compaction are essential for cost-effective long-running sessions – cache system prompts and frequent prefixes.
- Check feature availability before building – GA features are production-ready, while Beta features may change. Batch processing saves 50% but isn't ZDR eligible.