Mastering the Messages API: Building Conversational AI with Claude
Learn how to use Claude's Messages API for basic requests, multi-turn conversations, response prefilling, and vision capabilities with practical code examples.
This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, response prefilling, and image analysis with practical Python and TypeScript examples.
Mastering the Messages API: Building Conversational AI with Claude
Claude's Messages API is the primary interface for integrating Claude into your applications. Whether you're building a chatbot, a content generator, or a multimodal analysis tool, understanding how to work with messages effectively is essential. This guide walks you through everything from basic requests to advanced patterns like multi-turn conversations, response prefilling, and vision capabilities.
Understanding the Messages API vs. Managed Agents
Anthropic offers two primary ways to build with Claude:
- Messages API: Direct model prompting access. Best for custom agent loops and fine-grained control over every request and response.
- Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
Making Your First API Request
Let's start with the simplest possible interaction: sending a single message and receiving a response.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message);
Understanding the Response
The API returns a structured response containing:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
content: An array of content blocks (text, images, tool use, etc.)stop_reason: Indicates why the response ended ("end_turn","max_tokens","stop_sequence", or"tool_use")usage: Token counts for billing and monitoring
Building Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over context but requires you to manage conversation state on your end.
Example: Two-Turn Conversation
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
Key Patterns for Multi-Turn Conversations
- Maintain conversation history: Store all messages in a list or database, appending new user inputs and assistant responses.
- Include synthetic messages: Earlier turns don't need to originate from Claude—you can inject pre-written assistant messages to guide the conversation.
- Manage token limits: Longer histories consume more tokens. Use prompt caching or compaction for extended conversations.
# Example of managing conversation state
conversation_history = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there! How can I help?"}
]
Add new user message
conversation_history.append({"role": "user", "content": "What's the weather like?"})
Send full history
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=conversation_history
)
Append response to history
conversation_history.append({"role": "assistant", "content": response.content[0].text})
Prefilling Claude's Response
Prefilling lets you start Claude's response, guiding it toward a specific format or answer. This is powerful for:
- Forcing structured outputs (e.g., JSON, multiple choice)
- Setting the tone or style of the response
- Reducing latency by constraining the output
Example: Multiple Choice Answer
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Outputs: "C"
Important Notes on Prefilling
- Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. These models return a 400 error.
- Alternative for unsupported models: Use structured outputs or system prompt instructions instead.
- Use
max_tokenswisely: Settingmax_tokens=1forces a single-token response, ideal for classification tasks.
Working with Images (Vision)
The Messages API supports image inputs, enabling visual analysis and multimodal interactions.
Python Example: Image Analysis
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported Image Formats
- PNG
- JPEG
- WebP
- GIF (static only)
Handling Stop Reasons
Understanding why Claude stopped generating helps you build more robust applications:
| Stop Reason | Meaning | Typical Action |
|---|---|---|
end_turn | Claude finished naturally | Continue conversation or end |
max_tokens | Output hit token limit | Increase max_tokens or truncate |
stop_sequence | A custom stop sequence was hit | Handle based on sequence |
tool_use | Claude wants to use a tool | Execute tool and continue |
response = client.messages.create(...)
if response.stop_reason == "max_tokens":
print("Response was truncated. Consider increasing max_tokens.")
elif response.stop_reason == "tool_use":
print("Claude requested a tool call. Handle accordingly.")
Best Practices
1. Manage Token Usage Efficiently
- Use prompt caching for repeated system prompts or large context
- Implement conversation compaction for long histories
- Monitor
usagefields in responses to track costs
2. Handle Errors Gracefully
- Implement retry logic with exponential backoff
- Validate inputs before sending (e.g., image size, message format)
- Check for model-specific limitations (e.g., prefilling support)
3. Optimize for Latency
- Use streaming for real-time applications (see Streaming Messages docs)
- Prefill responses when output format is predictable
- Set appropriate
max_tokensto avoid unnecessary generation
4. Security Considerations
- The Messages API is eligible for Zero Data Retention (ZDR)—data is not stored after response is returned
- Never send sensitive information in prompts unless you have appropriate agreements
- Validate and sanitize user inputs before including them in messages
Key Takeaways
- The Messages API is stateless—you must send the full conversation history with each request, giving you complete control over context management.
- Prefilling lets you guide Claude's responses by starting its reply, but check model compatibility as some newer models don't support it.
- Multi-turn conversations require you to maintain and append to a conversation history list on your end.
- Vision capabilities are built-in—send images as content blocks alongside text for multimodal analysis.
- Monitor stop reasons to handle truncation, tool calls, and natural conversation endings appropriately.