Mastering the Messages API: Build Conversational AI with Claude
Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, prefill techniques, and vision tasks. Includes Python and TypeScript code examples.
This guide teaches you how to use Claude's Messages API to build conversational AI applications. You'll learn to make basic requests, manage multi-turn conversations, prefill Claude's responses, and handle images. Includes practical code examples in Python and TypeScript.
Introduction
The Messages API is the core interface for building with Claude. Whether you're creating a chatbot, a document analysis tool, or an AI-powered assistant, understanding how to work with messages is essential. This guide covers everything from basic requests to advanced techniques like prefill and vision.
Understanding the Messages API
Anthropic offers two primary ways to build with Claude:
- Messages API: Direct model prompting access. Best for custom agent loops and fine-grained control.
- Claude Managed Agents: Pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
Basic Request and Response
Let's start with the simplest possible interaction: sending a single message to Claude and getting a response.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message.content[0].text)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message.content[0].text);
Understanding the Response
The API returns a structured JSON object containing:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-sonnet-4-20250514",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
- content: An array of content blocks (usually text)
- stop_reason: Why the model stopped generating (e.g.,
"end_turn","max_tokens","stop_sequence") - usage: Token counts for billing and optimization
Multi-Turn Conversations
The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context.
Building a Conversation
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
Important Notes
- You don't need to use actual Claude responses for assistant messages. You can inject synthetic assistant messages to guide the conversation or provide context.
- Always alternate between
userandassistantroles. The conversation must start with ausermessage. - The entire history counts toward your input token usage, so be mindful of context length.
Putting Words in Claude's Mouth (Prefill)
Prefilling allows you to start Claude's response for it. This is useful for:
- Forcing structured outputs (e.g., JSON, multiple choice answers)
- Guiding the tone or style of the response
- Reducing token usage by constraining the output
Example: Multiple Choice Answer
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Output: "C"
Prefill Limitations
Important: Prefilling is not supported on Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.
For models that don't support prefill, consider:
- Structured outputs: Define a JSON schema for Claude to follow
- System prompt instructions: Use the system parameter to specify output format
Vision: Working with Images
The Messages API supports image inputs, enabling visual understanding and analysis.
Sending an Image
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported Image Formats
- JPEG
- PNG
- GIF
- WebP
Best Practices for Vision
- Use appropriate resolution: Images up to 8,000x8,000 pixels are supported
- Combine with text: Always include a text prompt alongside images for best results
- Consider token cost: Images consume tokens proportional to their size
Handling Stop Reasons
Understanding why Claude stopped generating helps you handle different scenarios:
| Stop Reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue or end conversation |
max_tokens | Output hit token limit | Increase max_tokens or truncate |
stop_sequence | Custom stop sequence triggered | Handle as designed |
tool_use | Claude wants to use a tool | Execute tool and return result |
Error Handling
Common errors and how to handle them:
import anthropic
from anthropic import APIError, APIConnectionError, RateLimitError
try:
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
except RateLimitError:
print("Rate limited. Implement exponential backoff.")
except APIConnectionError:
print("Network issue. Retry with backoff.")
except APIError as e:
print(f"API error: {e}")
Streaming Responses
For real-time applications, use streaming to receive tokens as they're generated:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Streaming is ideal for chatbots and any application where low latency matters.
Key Takeaways
- The Messages API is stateless — always send the full conversation history with each request. This gives you complete control over context.
- Use prefill carefully — it's powerful for constraining outputs but not supported on all models. Consider structured outputs as an alternative.
- Vision capabilities allow you to send images alongside text prompts for multimodal understanding. Always pair images with descriptive text.
- Handle stop reasons to build robust applications —
end_turn,max_tokens, andtool_useeach require different responses. - Stream for real-time applications — streaming reduces perceived latency and improves user experience in interactive applications.