Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision
Learn how to use the Claude Messages API for single and multi-turn conversations, prefill techniques, vision capabilities, and streaming. Includes Python and TypeScript code examples.
This guide teaches you how to send requests, manage multi-turn conversations, prefill Claude's responses, use vision with images, and stream outputs using the Messages API.
Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision
The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, an agent, or a content generation tool, understanding how to structure requests and handle responses is essential.
This guide walks you through the most common patterns: basic requests, multi-turn conversations, prefill techniques, vision capabilities, and streaming. By the end, you'll be able to build robust, production-ready applications with Claude.
Understanding the Messages API vs. Managed Agents
Anthropic offers two paths for building with Claude:
- Messages API: Direct access to the model. You control the conversation loop, manage state, and handle tool calls. Best for custom agent loops and fine-grained control.
- Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
Making Your First API Request
A basic request to the Messages API requires three things:
model: The Claude model you want to use (e.g.,claude-opus-4-7)max_tokens: The maximum number of tokens in the responsemessages: An array of message objects, each with aroleandcontent
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message);
Understanding the Response
The API returns a structured JSON object:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{ "type": "text", "text": "Hello!" }
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields:
content: An array of content blocks (text, tool_use, etc.)stop_reason: Why the model stopped (end_turn,max_tokens,stop_sequence,tool_use)usage: Token counts for billing and monitoring
Building Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context.
Example: Two-Turn Conversation
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
Important: The assistant messages in the history don't have to come from Claude. You can inject synthetic assistant messages to guide the conversation or provide context.
Best Practices for Conversation History
- Keep the full history for coherent multi-turn interactions
- Truncate or summarize older turns to stay within context limits
- Use
systemprompts for persistent instructions - Consider prompt caching for long conversations
Prefilling Claude's Response
Prefilling lets you start Claude's response for it. This is useful for:
- Forcing structured output formats
- Guiding the model toward a specific answer
- Reducing latency by constraining the first tokens
Example: Multiple Choice Answer
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is Latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Outputs: "C"
Prefill Limitations
- Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6
- Using prefill with these models returns a 400 error
- Alternative: Use structured outputs or system prompt instructions
Vision: Working with Images
Claude can process images sent via the Messages API. Images can be provided as base64-encoded data or as URLs.
Example: Image Analysis
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported Image Formats
- JPEG, PNG, GIF, WebP
- Maximum size: 100 MB per image
- Claude automatically resizes large images
Streaming Responses
For real-time applications, streaming reduces perceived latency. The API supports streaming via Server-Sent Events (SSE).
Python Streaming Example
stream = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Write a short poem about AI."}
],
stream=True
)
for event in stream:
if event.type == "content_block_delta":
print(event.delta.text, end="", flush=True)
TypeScript Streaming Example
const stream = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Write a short poem about AI.' }
],
stream: true
});
for await (const event of stream) {
if (event.type === 'content_block_delta') {
process.stdout.write(event.delta.text);
}
}
Handling Stop Reasons
Claude can stop generating for several reasons. Your code should handle each case:
stop_reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Return response |
max_tokens | Response was cut off | Continue with more tokens or truncate |
stop_sequence | A custom stop sequence was hit | Handle as needed |
tool_use | Claude wants to call a tool | Execute tool and continue |
Error Handling Best Practices
Always wrap API calls in try-except blocks:
try:
message = client.messages.create(...)
except anthropic.APIError as e:
print(f"API error: {e}")
except anthropic.RateLimitError as e:
print(f"Rate limited: {e}")
# Implement exponential backoff
except anthropic.APIConnectionError as e:
print(f"Connection error: {e}")
# Retry the request
Key Takeaways
- The Messages API is stateless—always send the full conversation history with each request
- Prefill is powerful but limited—use it for structured outputs, but avoid it on newer models; use structured outputs instead
- Vision support is built-in—send images as base64 or URLs for multimodal analysis
- Streaming reduces latency—use SSE for real-time applications like chat interfaces
- Always handle stop reasons—especially
tool_useif you're building agents, andmax_tokensfor long responses