Getting Started with Claude API: From First Call to Production
Learn how to integrate Claude AI into your applications using the Messages API. This guide covers setup, code examples, tool use, streaming, and best practices for production deployment.
This guide walks you through setting up the Claude API, making your first call with Python, using tools, streaming responses, and deploying to production with best practices for latency, cost, and safety.
Introduction
Claude is Anthropic's powerful AI assistant, accessible via a robust API that lets you integrate its capabilities into your own applications. Whether you're building a chatbot, a code assistant, or an autonomous agent, the Claude API provides the flexibility and performance you need.
This guide will take you from your first API call to a production-ready integration. You'll learn how to set up your environment, use the Messages API, handle streaming, leverage tools, and apply best practices for safety and efficiency.
Prerequisites
Before you start, you'll need:
- An Anthropic account and API key (get one from the Anthropic Console)
- Python 3.7+ installed on your machine
- Basic familiarity with REST APIs and JSON
Step 1: Setting Up Your Environment
Install the Anthropic Python SDK:
pip install anthropic
Set your API key as an environment variable (recommended for security):
export ANTHROPIC_API_KEY="your-api-key-here"
Step 2: Making Your First API Call
Here's the simplest way to send a message to Claude using the Messages API:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250506",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude!"}
]
)
print(message.content[0].text)
What's happening?
- We create an
Anthropicclient using your API key. - We call
messages.create()with the model name, max tokens, and a list of messages. - The response contains the assistant's reply in
content[0].text.
Choosing a Model
Claude comes in three tiers:
| Model | ID | Best For |
|---|---|---|
| Opus 4.7 | claude-opus-4-20250514 | Complex analysis, deep reasoning, creative tasks |
| Sonnet 4.6 | claude-sonnet-4-20250506 | Balanced intelligence and speed for production |
| Haiku 4.5 | claude-haiku-4-20250507 | High-volume, latency-sensitive applications |
Step 3: Building a Multi-Turn Conversation
To maintain context, send the entire conversation history with each request:
import anthropic
client = anthropic.Anthropic()
messages = [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What is its population?"}
]
response = client.messages.create(
model="claude-sonnet-4-20250506",
max_tokens=1024,
messages=messages
)
print(response.content[0].text)
Important: Always include the full message history. Claude does not maintain state between calls.
Step 4: Streaming Responses for Better UX
For real-time applications, stream the response token by token:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-20250506",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a short poem about AI."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Streaming reduces perceived latency and allows you to display partial results as they arrive.
Step 5: Using Tools (Function Calling)
Tools let Claude interact with external systems. Here's how to define and use a simple calculator tool:
import anthropic
client = anthropic.Anthropic()
Define a tool
calculator_tool = {
"name": "calculator",
"description": "Perform arithmetic operations",
"input_schema": {
"type": "object",
"properties": {
"operation": {
"type": "string",
"enum": ["add", "subtract", "multiply", "divide"]
},
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["operation", "a", "b"]
}
}
response = client.messages.create(
model="claude-sonnet-4-20250506",
max_tokens=1024,
messages=[{"role": "user", "content": "What is 25 * 4?"}],
tools=[calculator_tool]
)
Check if Claude wants to use a tool
if response.stop_reason == "tool_use":
tool_call = response.content[1] # content[0] is text, content[1] is tool use
print(f"Tool called: {tool_call.name}")
print(f"Arguments: {tool_call.input}")
How it works:
- You define tools with a name, description, and input schema.
- Claude decides when to call a tool based on the user's request.
- You execute the tool logic on your side and return the result.
Parallel Tool Use
Claude can call multiple tools simultaneously for efficiency:
response = client.messages.create(
model="claude-sonnet-4-20250506",
max_tokens=1024,
messages=[{"role": "user", "content": "Get weather for New York and London."}],
tools=[weather_tool_ny, weather_tool_london]
)
Step 6: Advanced Features
Extended Thinking
For complex reasoning tasks, enable extended thinking:
response = client.messages.create(
model="claude-opus-4-20250514",
max_tokens=4096,
thinking={"type": "enabled", "budget_tokens": 2048},
messages=[{"role": "user", "content": "Solve this complex math problem..."}]
)
Structured Outputs
Get responses in a structured format like JSON:
response = client.messages.create(
model="claude-sonnet-4-20250506",
max_tokens=1024,
messages=[{"role": "user", "content": "List three fruits as JSON."}],
response_format={"type": "json_object"}
)
Prompt Caching
Reduce costs and latency by caching repeated system prompts:
response = client.messages.create(
model="claude-sonnet-4-20250506",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": "Hello"}]
)
Step 7: Production Best Practices
Error Handling
Always handle API errors gracefully:
try:
response = client.messages.create(...)
except anthropic.APIError as e:
print(f"API error: {e}")
except anthropic.APIConnectionError as e:
print(f"Connection error: {e}")
except anthropic.RateLimitError as e:
print(f"Rate limited: {e}")
# Implement exponential backoff
Safety and Guardrails
- Use the
systemparameter to set behavioral constraints. - Implement content filtering on user inputs.
- Monitor for prompt injection attacks.
Cost Optimization
- Use Haiku for simple tasks, Sonnet for most work, Opus only when needed.
- Enable prompt caching for repeated system prompts.
- Set appropriate
max_tokenslimits.
Conclusion
You now have a solid foundation for building with the Claude API. Start with simple calls, add streaming for better UX, integrate tools for external actions, and follow best practices for production deployment.
For more advanced patterns, explore the Claude Cookbook for code samples and the Anthropic Console for testing and monitoring.
Key Takeaways
- The Messages API is the core interface for all Claude interactions, supporting multi-turn conversations, streaming, and tool use.
- Choose the right model: Sonnet for balance, Opus for complex reasoning, Haiku for speed and cost efficiency.
- Tools enable Claude to interact with external systems; define them with a clear schema and handle tool calls in your application logic.
- Streaming reduces perceived latency and improves user experience for real-time applications.
- Always implement error handling, rate limiting, and safety guardrails before moving to production.