Mastering Claude’s API: A Practical Guide to Features, Tools, and Context Management
Learn how to build with Claude's API using model capabilities, tools, context management, and files. Includes code examples, feature availability, and best practices for production.
This guide walks you through Claude’s API surface—model capabilities, tools, context management, and file handling—with practical code examples and feature availability details to help you build smarter, faster applications.
Introduction
Claude’s API is more than just a text generation endpoint. It’s a rich platform designed to give you fine-grained control over how Claude reasons, what actions it can take, and how you manage long-running conversations. Whether you’re building a customer support bot, a code assistant, or a research tool, understanding the five core areas of the API surface will help you get the most out of Claude.
This guide covers:
- Model capabilities – reasoning depth, response format, and input modalities
- Tools – letting Claude act on the web or in your environment
- Tool infrastructure – discovery and orchestration at scale
- Context management – keeping long sessions efficient
- Files and assets – managing documents and data
1. Model Capabilities: Steering Claude’s Output
Claude’s model capabilities let you control how it reasons and formats responses. The key features include:
Extended Thinking & Adaptive Thinking
Claude can “think” before responding, which improves reasoning on complex tasks. With Adaptive Thinking (GA on Claude API, AWS, Bedrock, and Vertex AI), Claude dynamically decides when and how much to think. You can also set a fixed thinking budget using the effort parameter.
effort parameter in Python
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Solve this step by step: 23 * 47"}],
thinking={"type": "enabled", "budget_tokens": 2000, "effort": "high"}
)
print(response.content[0].text)
Structured Outputs
Claude can return structured data (JSON) directly, making it easy to integrate with your application logic.
Example: Requesting JSON outputresponse = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "List three fruits in JSON format with name and color."}],
response_format={"type": "json_object"}
)
print(response.content[0].text)
Output: {"fruits": [{"name": "Apple", "color": "Red"}, ...]}
Streaming & Batch Processing
- Streaming – Get tokens as they’re generated for real-time UX.
- Batch Processing – Send large volumes of requests asynchronously at 50% lower cost (not ZDR eligible).
stream = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a short story."}],
stream=True
)
for event in stream:
if event.type == "content_block_delta":
print(event.delta.text, end="")
2. Tools: Letting Claude Take Action
Tools extend Claude’s capabilities beyond text. You can define custom tools (functions) or use built-in tools like web search, code execution, and computer use.
How Tool Use Works
- You define a tool with a name, description, and input schema.
- Claude decides whether to call the tool based on the conversation.
- You execute the tool and return the result to Claude.
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools
)
Check if Claude wants to call a tool
if response.stop_reason == "tool_use":
tool_call = response.content[-1]
print(f"Tool called: {tool_call.name}")
print(f"Arguments: {tool_call.input}")
Built-in Tools (Beta)
- Web Search Tool – Claude can search the web for up-to-date information.
- Code Execution Tool – Run Python code in a sandboxed environment.
- Computer Use Tool – Claude can interact with a virtual desktop (beta: research preview).
3. Tool Infrastructure: Discovery & Orchestration
When you have many tools, you need a way to manage them. Claude’s tool infrastructure includes:
- Tool Runner (SDK) – Automates tool execution and result injection.
- Strict Tool Use – Forces Claude to use a specific tool.
- Parallel Tool Use – Claude can call multiple tools at once.
- Tool Search – Dynamically find the right tool for a task.
- Fine-grained Tool Streaming – Stream tool calls and results separately.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Get the weather in Paris and London."}],
tools=[weather_tool],
parallel_tool_calls=True
)
4. Context Management: Keeping Sessions Efficient
Long conversations can consume many tokens. Claude provides several features to manage context:
Context Windows
Claude supports up to 1 million tokens of context (GA on most platforms). This allows processing entire books, large codebases, or long chat histories.
Prompt Caching
Cache repeated system prompts or large context blocks to reduce latency and cost. Cached prompts are served faster and at a lower token cost.
Example: Using prompt cachingresponse = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a legal assistant...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": "Summarize this contract."}]
)
Context Editing & Compaction
- Context Editing – Manually insert or remove messages from the conversation history.
- Compaction – Automatically summarize older parts of the conversation to save tokens.
5. Files and Assets: Working with Documents
Claude can process files directly, including PDFs, images, and text documents.
PDF Support
You can send PDFs to Claude for analysis. Claude will extract text and layout information.
Example: Sending a PDFimport base64
with open("document.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{"type": "text", "text": "Summarize this document."}
]
}
]
)
Images & Vision
Claude can analyze images (photos, diagrams, screenshots) and answer questions about them.
Feature Availability Quick Reference
Not all features are available on every platform. Here’s a summary:
| Feature | Claude API | AWS | Bedrock | Vertex AI |
|---|---|---|---|---|
| Extended Thinking | GA | GA | GA | GA |
| Batch Processing | GA | GA | GA | GA |
| Prompt Caching | GA | GA | GA | GA |
| Web Search Tool | Beta | Beta | Beta | Beta |
| Computer Use | Beta | Beta | Beta | Beta |
| Structured Outputs | GA | GA | GA | GA |
Best Practices for Production
- Start with model capabilities – Get your core logic working before adding tools.
- Use prompt caching for system prompts and large context blocks to reduce costs.
- Monitor token usage with the token counting API to avoid surprises.
- Handle stop reasons – Check
stop_reasonin responses to detect tool calls, max tokens, or end of turn. - Test with streaming for better user experience, but fall back to non-streaming for reliability.
Key Takeaways
- Claude’s API is organized into five areas: model capabilities, tools, tool infrastructure, context management, and files.
- Use Extended Thinking for complex reasoning tasks and Structured Outputs for JSON integration.
- Tools let Claude interact with external systems; use Parallel Tool Use for efficiency.
- Prompt Caching and Context Windows help manage long sessions cost-effectively.
- Check feature availability per platform—some features are still in beta on certain providers.