Mastering Claude’s API: A Practical Guide to Features, Tools, and Context Management
Learn how to build with Claude’s API using model capabilities, tools, context management, and files. Includes code examples and best practices for production use.
This guide walks you through Claude’s five core API areas: model capabilities, tools, tool infrastructure, context management, and files. You’ll learn how to use extended thinking, structured outputs, tool calling, prompt caching, and batch processing with practical Python examples.
Mastering Claude’s API: A Practical Guide to Features, Tools, and Context Management
Claude’s API is designed to give developers fine-grained control over how the model reasons, interacts with external systems, and manages long-running conversations. Whether you’re building a simple chatbot or a complex agent that browses the web, executes code, and manages memory, understanding the five core areas of the API surface is essential.
This guide covers everything you need to get started—and scale up. We’ll walk through each area with practical code examples, best practices, and tips for optimizing cost and latency.
The Five Pillars of Claude’s API
Claude’s API surface is organized into five areas:
- Model capabilities – Control how Claude reasons and formats responses.
- Tools – Let Claude take actions on the web or in your environment.
- Tool infrastructure – Handle discovery and orchestration at scale.
- Context management – Keep long-running sessions efficient.
- Files and assets – Manage documents and data you provide to Claude.
Model Capabilities: Steering Claude’s Output
Claude offers several ways to control its reasoning depth, response format, and input modalities. Here are the most important ones for production use.
Extended Thinking and Adaptive Thinking
Extended thinking lets Claude “think” before responding, improving performance on complex math, coding, and analysis tasks. Adaptive thinking (recommended for Opus 4.7) lets Claude decide dynamically when and how much to think.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2048
},
messages=[
{"role": "user", "content": "Solve this equation: 3x^2 + 5x - 2 = 0"}
]
)
print(response.content[0].text)
For adaptive thinking, use the effort parameter instead:
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"effort": "high" # Options: low, medium, high
},
messages=[...]
)
Structured Outputs
You can force Claude to return responses in a specific JSON schema using the structured_outputs parameter. This is ideal for extracting data, generating forms, or building API endpoints.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Extract the name, date, and amount from this invoice: Invoice #1234, dated 2025-03-15, for $450.00."}
],
structured_outputs={
"json_schema": {
"name": "invoice",
"strict": True,
"schema": {
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string"},
"amount": {"type": "number"}
},
"required": ["invoice_number", "date", "amount"]
}
}
}
)
print(response.content[0].text)
Citations
Citations let Claude ground its responses in source documents, providing exact references to sentences and passages. This is a game-changer for trust and verifiability.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Summarize the key findings from the attached research paper."}
],
documents=[
{
"type": "text",
"title": "Research Paper",
"content": "...",
"citations": {"enabled": True}
}
]
)
Tools: Letting Claude Take Action
Claude can use tools to interact with the outside world—search the web, execute code, read files, and more.
Defining a Custom Tool
You define tools using a JSON schema. Here’s a simple weather lookup tool:
def get_weather(location: str) -> str:
# In production, call a real weather API
return f"The weather in {location} is sunny, 72°F."
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get the current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g., San Francisco, CA"
}
},
"required": ["location"]
}
}
],
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
]
)
Handle the tool call
if response.stop_reason == "tool_use":
tool_call = response.content[-1]
if tool_call.name == "get_weather":
result = get_weather(tool_call.input["location"])
# Send result back to Claude
final_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"},
{"role": "assistant", "content": response.content},
{"role": "user", "content": [
{
"type": "tool_result",
"tool_use_id": tool_call.id,
"content": result
}
]}
],
tools=[...]
)
print(final_response.content[0].text)
Built-in Tools
Claude provides several server-side tools you can enable with a single flag:
- Web search tool – Let Claude search the web in real time.
- Code execution tool – Run Python code in a sandboxed environment.
- File reading tool – Read local files during a session.
- Computer use tool – Let Claude control a virtual desktop.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[
{"type": "web_search"},
{"type": "code_execution"}
],
messages=[
{"role": "user", "content": "Search for the latest AI news and write a Python script to summarize it."}
]
)
Parallel Tool Use
Claude can call multiple tools in a single response, which is great for efficiency. Just define multiple tools and Claude will decide which to invoke.
Context Management: Keeping Conversations Efficient
Long-running sessions can become expensive and slow. Claude offers several features to manage context effectively.
Prompt Caching
Prompt caching reduces cost and latency by reusing cached prefixes. This is ideal for system prompts, few-shot examples, or large context documents.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant with expertise in Python programming.",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Write a function to reverse a linked list."}
]
)
Context Compaction
When a conversation grows too long, you can compact it by summarizing earlier turns while preserving key information. This is available as a server-side tool.
Token Counting
Always check token usage to avoid hitting limits unexpectedly:
usage = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
).usage
print(f"Input tokens: {usage.input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
Files and Assets: Working with Documents
Claude can process PDFs, images, and text files directly. Use the Files API to upload and reference documents.
# Upload a PDF
with open("report.pdf", "rb") as f:
file = client.files.create(
file=f,
purpose="assistants"
)
Use the file in a conversation
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{
"role": "user",
"content": [
{
"type": "file",
"source": {
"type": "upload",
"file_id": file.id
}
},
{
"type": "text",
"text": "Summarize this report."
}
]
}
]
)
Batch Processing for Cost Savings
If you have large volumes of requests, use the Batch API to process them asynchronously at 50% lower cost.
batch = client.batches.create(
requests=[
{
"custom_id": "req-001",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Translate to French: Hello, world!"}]
}
},
{
"custom_id": "req-002",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Translate to Spanish: Hello, world!"}]
}
}
]
)
Later, retrieve results
results = client.batches.retrieve(batch.id)
Best Practices for Production
- Start simple – Begin with model capabilities and one or two tools. Add complexity only when needed.
- Use prompt caching – Cache system prompts and few-shot examples to reduce latency and cost.
- Monitor token usage – Always track input and output tokens to stay within budget.
- Handle tool calls gracefully – Always check
stop_reasonand handletool_useresponses. - Leverage batch processing – For non-real-time workloads, batch requests can cut costs in half.
Key Takeaways
- Claude’s API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files.
- Extended thinking and structured outputs give you fine-grained control over Claude’s reasoning and response format.
- Tools (both custom and built-in) let Claude interact with external systems—search, code execution, file reading, and more.
- Prompt caching and context compaction are essential for managing long-running conversations efficiently.
- Batch processing offers 50% cost savings for asynchronous, high-volume workloads.