Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude
Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision. Includes code examples in Python and TypeScript.
This guide teaches you how to build conversational AI with Claude's Messages API, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical Python and TypeScript code examples.
Introduction
Claude's Messages API is the primary way to interact with Claude programmatically. Whether you're building a custom chatbot, an AI assistant, or integrating Claude into your application, understanding the Messages API is essential. This guide covers the most common patterns—from simple requests to advanced techniques like prefill and vision—so you can get the most out of Claude.
Basic Request and Response
At its core, the Messages API is straightforward: you send a list of messages and receive a response. Here's a minimal example in Python:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
The response includes the model's reply, metadata, and token usage:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
stop_reason: Indicates why the response ended (end_turnmeans Claude finished naturally).usage: Tracks input and output tokens for billing and optimization.
Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over the conversation context.
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
Important: The assistant messages don't have to come from Claude—you can inject synthetic assistant responses to guide the conversation. This is useful for:
- Providing example responses
- Correcting or redirecting Claude
- Simulating multi-turn interactions
Prefill: Putting Words in Claude's Mouth
Prefill allows you to start Claude's response by providing the beginning of its answer. This is powerful for controlling output format, enforcing structure, or getting concise answers.
Example: Multiple Choice Answer
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
Response:
{
"content": [{"type": "text", "text": "C"}],
"stop_reason": "max_tokens"
}
Prefill Limitations
Prefill is not supported on these models:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
When to Use Prefill
- Format control: Force JSON, XML, or specific output structures
- Constrained generation: Get single-token answers (yes/no, multiple choice)
- Role-playing: Set the tone or persona from the first word
Vision: Sending Images to Claude
Claude can analyze images sent via the Messages API. This enables use cases like document analysis, image description, and visual Q&A.
Base64 Image Example (Python)
import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported Image Formats
- JPEG, PNG, GIF, WebP
- Maximum size: 100MB per image
- Claude processes images at varying resolutions; larger images use more tokens
Vision Best Practices
- Combine with text: Always include a text prompt alongside images for best results
- Use appropriate resolution: High-resolution images provide more detail but cost more tokens
- One image per message: For complex analysis, send images one at a time
Handling Stop Reasons
The stop_reason field tells you why Claude stopped generating. Common values:
| Stop Reason | Meaning |
|---|---|
end_turn | Claude finished naturally |
max_tokens | Response hit the token limit |
stop_sequence | Claude encountered a stop sequence |
tool_use | Claude wants to use a tool |
max_tokens, consider increasing max_tokens or breaking your request into smaller chunks.
Streaming Responses
For real-time applications, use streaming to get Claude's response incrementally:
stream = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for event in stream:
if event.type == "content_block_delta":
print(event.delta.text, end="", flush=True)
Streaming is ideal for:
- Chat interfaces with real-time display
- Long responses where users expect immediate feedback
- Reducing perceived latency
Error Handling
Common API errors and how to handle them:
| Error | Cause | Solution |
|---|---|---|
| 400 Bad Request | Invalid parameters | Check model name, message format |
| 401 Unauthorized | Invalid API key | Verify your API key |
| 429 Rate Limit | Too many requests | Implement exponential backoff |
| 529 Overloaded | Server overload | Retry with delay |
import time
from anthropic import Anthropic, APIError, RateLimitError
client = Anthropic()
for attempt in range(3):
try:
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
break
except RateLimitError:
time.sleep(2 ** attempt)
except APIError as e:
print(f"API error: {e}")
break
Conclusion
The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision, you can create sophisticated AI applications. Remember:
- Always send the full conversation history (stateless API)
- Use prefill for output control (but check model compatibility)
- Stream responses for better user experience
- Handle errors gracefully with retries
Key Takeaways
- Stateless design: You must send the full conversation history with each request—this gives you complete control over context.
- Prefill for precision: Use prefill to control output format and get concise answers, but avoid it on newer models (Opus 4.7, Sonnet 4.6) where it's unsupported.
- Vision integration: Claude can analyze images via base64 encoding; always pair images with text prompts for best results.
- Stream for speed: Streaming reduces perceived latency and is essential for real-time chat interfaces.
- Error handling matters: Implement retry logic with exponential backoff to handle rate limits and server overloads gracefully.