Mastering the Claude API: A Complete Guide to Features, Tools, and Context Management
Explore Claude's API surface including model capabilities, tools, context management, and file handling. Learn how to build powerful AI applications with practical code examples.
This guide covers Claude's five API areas: model capabilities (thinking, citations), tools (web search, code execution), context management (prompt caching, compaction), and file handling. You'll learn how to use each with practical Python examples.
Introduction
Claude's API is more than just a text-in, text-out interface. It's a comprehensive platform designed to give developers fine-grained control over how Claude reasons, interacts with external systems, and manages long-running conversations. Whether you're building a customer support bot, a code assistant, or an agent that browses the web, understanding the full API surface is essential.
This guide breaks down the five core areas of the Claude API: model capabilities, tools, tool infrastructure, context management, and files/assets. You'll learn what each area offers, when to use it, and see practical code examples you can adapt immediately.
1. Model Capabilities: Steering Claude's Reasoning
Model capabilities control how Claude processes input and what it outputs. These are the foundational building blocks for any application.
Extended Thinking and Adaptive Thinking
Claude can "think" before responding, which improves reasoning on complex tasks. With Extended Thinking, you set a fixed thinking budget. With Adaptive Thinking (recommended for Opus 4.7), Claude decides dynamically how much to think based on the task.
Python example: Adaptive thinkingimport anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2048,
"effort": "high" # Controls depth: low, medium, high
},
messages=[
{"role": "user", "content": "Analyze the pros and cons of quantum computing for cryptography."}
]
)
print(response.content[0].text)
Citations
When Claude needs to reference source documents, use the Citations feature. Claude will return exact quotes and line numbers from your provided documents.
Python example: Citationsresponse = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
documents=[
{
"type": "text",
"title": "Company Policy",
"data": "All refunds must be requested within 30 days of purchase..."
}
],
messages=[
{"role": "user", "content": "What is the refund policy?"}
]
)
Citations are included in response.content as citation objects
for block in response.content:
if block.type == "citation":
print(f"Source: {block.document_title}, Lines {block.start_line}-{block.end_line}")
Structured Outputs
For programmatic consumption, you can force Claude to output valid JSON conforming to a schema.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Extract the name, date, and amount from this invoice."}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "invoice",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"date": {"type": "string"},
"amount": {"type": "number"}
},
"required": ["name", "date", "amount"]
}
}
}
)
2. Tools: Letting Claude Act in the World
Tools extend Claude's capabilities beyond text generation. Claude can call functions, browse the web, execute code, and more.
Built-in Tools
Claude offers several first-party tools:
| Tool | Purpose |
|---|---|
| Web search tool | Fetch real-time information from the internet |
| Code execution tool | Run Python/JavaScript in a sandbox |
| Computer use tool | Control a virtual desktop (beta) |
| Memory tool | Persist information across conversations |
| Bash tool | Execute shell commands |
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
tools=[
{
"type": "web_search",
"name": "web_search"
}
],
messages=[
{"role": "user", "content": "What is the current population of Tokyo?"}
]
)
Claude will use the web_search tool if needed
print(response.content[0].text)
Custom Tools (Function Calling)
You can define your own tools using a JSON schema. Claude will request tool calls, and you execute them.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
],
messages=[
{"role": "user", "content": "What's the weather in London?"}
]
)
Check for tool use
for block in response.content:
if block.type == "tool_use":
print(f"Tool: {block.name}")
print(f"Input: {block.input}")
# Execute the tool and send result back
3. Tool Infrastructure: Orchestration at Scale
When you have many tools, you need a way to manage discovery, routing, and execution. Claude's tool infrastructure includes:
- Tool Runner (SDK): Automates the tool-calling loop
- Strict tool use: Forces Claude to use only specified tools
- Parallel tool use: Execute multiple tools simultaneously
- Fine-grained tool streaming: Stream tool calls and text separately
Tool Runner Example
from anthropic import Anthropic
from anthropic.types import ToolUseBlock
client = Anthropic()
Define tools
weather_tool = {
"name": "get_weather",
"description": "Get weather",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
Tool Runner handles the loop automatically
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[weather_tool],
tool_choice={"type": "auto"},
messages=[{"role": "user", "content": "What's the weather in Paris and Tokyo?"}]
)
With parallel tool use, Claude may call both at once
4. Context Management: Keeping Conversations Efficient
Long conversations consume tokens and increase latency. Claude provides several features to manage context.
Context Windows
Claude supports up to 1 million tokens of context. That's enough to process entire codebases or hundreds of pages of documents.
Prompt Caching
Cache frequently used system prompts or document chunks to reduce latency and cost.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant with knowledge of our company policy.",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "What is the vacation policy?"}
]
)
Context Compaction
When conversations grow too long, you can compact the context—summarizing earlier turns while preserving essential information.
Token Counting
Always check token usage before sending large payloads.
token_count = client.messages.count_tokens(
model="claude-sonnet-4-20250514",
messages=[
{"role": "user", "content": "Hello, world!"}
]
)
print(f"Token count: {token_count}")
5. Files and Assets: Working with Documents
Claude can process various file types natively.
PDF Support
Claude extracts text and layout from PDFs, making them searchable and quotable.
import base64
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarize this report."
}
]
}
]
)
Images and Vision
Claude can analyze images for tasks like OCR, object detection, and visual question answering.
Feature Availability by Platform
Not all features are available everywhere. Here's a quick reference:
| Feature | Claude API | AWS Bedrock | Vertex AI |
|---|---|---|---|
| Extended Thinking | GA | GA | GA |
| Citations | GA | GA | Beta |
| Web Search | GA | GA | GA |
| Prompt Caching | GA | GA | GA |
| Batch Processing | GA | GA | GA |
Best Practices
- Start with model capabilities, then add tools as needed. Don't over-engineer.
- Use adaptive thinking for Opus 4.7—it saves tokens on simple tasks and spends more on complex ones.
- Cache your system prompts to reduce costs on repeated calls.
- Use structured outputs when consuming responses programmatically—avoid parsing free text.
- Monitor token usage with the count_tokens endpoint before sending large payloads.
Key Takeaways
- Claude's API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files/assets.
- Use adaptive thinking to let Claude decide how much to reason—saves tokens on simple tasks.
- Tools extend Claude into the real world: web search, code execution, custom functions, and more.
- Prompt caching and context compaction keep long-running conversations efficient and cost-effective.
- Always check feature availability per platform—some features are still in beta on certain providers.