Mastering Claude’s API: A Practical Guide to Features, Tools, and Context Management
Explore Claude's API surface: model capabilities, tools, context management, and files. Learn how to steer reasoning, use tools, and optimize costs with practical code examples.
This guide walks you through Claude’s five API areas: model capabilities (thinking, citations), tools (web fetch, code execution), context management (prompt caching, compaction), and file handling. You’ll learn how to use each with code examples and best practices for production.
Introduction
Claude’s API is more than just a text-in, text-out interface. It’s a rich ecosystem of features designed to give you fine-grained control over how Claude reasons, what actions it can take, and how you manage long-running conversations. Whether you’re building a customer support bot, a code assistant, or a document analysis tool, understanding the full API surface will help you build faster, cheaper, and more reliably.
This guide covers the five main areas of the Claude API:
- Model capabilities – controlling reasoning depth, response format, and input modalities
- Tools – letting Claude interact with the web, files, and your environment
- Tool infrastructure – discovery and orchestration at scale
- Context management – keeping long sessions efficient
- Files and assets – managing documents and data
Model Capabilities
Model capabilities are the core levers you pull to shape Claude’s output. Here are the most impactful ones.
Extended Thinking & Adaptive Thinking
Claude can now decide when and how much to think before responding. This is especially useful for complex reasoning tasks like math, code generation, or multi-step planning.
Adaptive thinking (recommended for Opus 4.7) lets Claude dynamically allocate thinking time. You control the depth via theeffort parameter.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2000,
"effort": "high" # low, medium, high
},
messages=[
{"role": "user", "content": "Solve this: A train leaves New York at 3 PM traveling 60 mph. Another train leaves Boston at 4 PM traveling 70 mph. The distance is 200 miles. When do they meet?"}
]
)
print(response.content[0].text)
Tip: Useeffort: "low"for simple Q&A to save tokens, andeffort: "high"for math, logic, or code generation.
Citations
Citations ground Claude’s responses in source documents. When you provide a document, Claude can return exact references to the relevant passages.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "What is the refund policy?"}
],
documents=[
{
"type": "text",
"title": "Refund Policy",
"content": "Refunds are available within 30 days of purchase..."
}
],
citations=True
)
print(response.content[0].citations)
Citations are GA on the Claude API and work with PDFs and text files.
Batch Processing
If you have thousands of requests (e.g., classifying support tickets, translating content), use Batch Processing to save 50% on API costs. Requests are processed asynchronously.
batch = client.batches.create(
requests=[
{
"custom_id": "req-001",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Classify this: 'My order is late'"}]
}
},
{
"custom_id": "req-002",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 256,
"messages": [{"role": "user", "content": "Classify this: 'I love the new feature'"}]
}
}
]
)
Check results later
results = client.batches.retrieve(batch.id)
Note: Batch processing is not eligible for Zero Data Retention (ZDR). Use it for non-sensitive workloads.
Tools: Let Claude Take Action
Tools extend Claude’s capabilities beyond text generation. Claude can call functions, fetch web pages, execute code, and more.
Web Search Tool
Give Claude real-time web access. Perfect for research, news, or fact-checking.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"type": "web_search",
"name": "web_search"
}
],
messages=[
{"role": "user", "content": "What is the current population of Tokyo?"}
]
)
print(response.content[0].text)
Code Execution Tool
Let Claude write and run Python code in a sandboxed environment. Great for data analysis, calculations, or prototyping.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
tools=[
{
"type": "code_execution",
"name": "execute_python"
}
],
messages=[
{"role": "user", "content": "Calculate the compound interest on $10,000 at 5% for 10 years."}
]
)
Parallel Tool Use
Claude can call multiple tools at once, reducing latency. For example, fetching weather data and calendar events simultaneously.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{"type": "web_search", "name": "web_search"},
{"type": "function", "name": "get_calendar_events", "description": "Get today's events"}
],
parallel_tool_calls=True,
messages=[
{"role": "user", "content": "What's the weather and do I have any meetings today?"}
]
)
Context Management
Long conversations consume tokens and increase latency. Claude provides several tools to keep sessions efficient.
Prompt Caching
Cache frequently used context (system prompts, few-shot examples, large documents) to reduce cost and latency. Cached content is reused across requests.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant for Acme Corp. Our products include...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "What is your return policy?"}
]
)
Context Compaction
When a conversation grows too long, use context compaction to summarize older messages while preserving key information.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Let's review our conversation so far and compact it."}
],
compaction=True
)
Token Counting
Always check token usage before sending large payloads. Use the token counting endpoint to estimate costs.
count = client.messages.count_tokens(
model="claude-sonnet-4-20250514",
messages=[
{"role": "user", "content": "Hello, can you help me with..."}
]
)
print(f"Input tokens: {count.input_tokens}")
Files and Assets
Claude can process PDFs, images, and text files directly.
PDF Support
Upload PDFs for analysis, summarization, or question-answering. Claude extracts text and layout.
import base64
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarize this report."
}
]
}
]
)
print(response.content[0].text)
Images and Vision
Claude can analyze images for object detection, OCR, or visual reasoning.
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "What does this chart show?"
}
]
}
]
)
Best Practices for Production
- Start with model capabilities – master thinking, citations, and batch before adding tools.
- Use prompt caching for system prompts and large context to reduce costs by up to 90%.
- Monitor token usage with the counting endpoint to avoid surprises.
- Enable parallel tool calls when Claude needs multiple pieces of information.
- Use batch processing for non-urgent, high-volume tasks to save 50%.
Key Takeaways
- Claude’s API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files.
- Adaptive thinking lets Claude dynamically allocate reasoning depth – use
effortto control it. - Batch processing cuts costs by 50% for asynchronous workloads.
- Prompt caching and context compaction keep long sessions efficient and affordable.
- Tools like web search and code execution let Claude take real-world actions, and parallel tool calls reduce latency.