Mastering the Messages API: Building Conversational AI with Claude
Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
Introduction
Claude's Messages API is the primary interface for building conversational AI applications. Whether you're creating a simple chatbot, a complex agent system, or a vision-enabled application, understanding how to work with messages is essential. This guide covers everything from basic requests to advanced techniques like prefill and multi-turn conversations.
Understanding the Messages API
The Messages API is a stateless, RESTful API that accepts a list of messages and returns a model-generated response. Unlike some other APIs, you must send the full conversation history with each request. This design gives you complete control over the conversation context.
Key Concepts
- Messages: An array of conversation turns, each with a
role(user or assistant) andcontent. - Roles:
userfor human messages,assistantfor Claude's responses. - Stateless: Each request is independent; you manage conversation state on your end.
- Stop Reasons: Indicates why Claude stopped generating (e.g.,
end_turn,max_tokens,stop_sequence).
Basic Request and Response
Let's start with the simplest possible request: a single user message.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message.content[0].text)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message.content[0].text);
Response Structure
The API returns a structured response:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you today?"
}
],
"model": "claude-sonnet-4-5",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields:
content: An array of content blocks (text, tool_use, etc.)stop_reason: Why generation stopped (end_turn,max_tokens,stop_sequence,tool_use)usage: Token counts for billing and context management
Building Multi-Turn Conversations
Since the Messages API is stateless, you must send the entire conversation history with each request. This pattern enables rich, context-aware interactions.
Python Example
import anthropic
client = anthropic.Anthropic()
First turn
message1 = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
Second turn - include previous messages
message2 = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": message1.content[0].text},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message2.content[0].text)
Important Notes
- Synthetic Messages: You can inject synthetic assistant messages (e.g., from a database or previous session) to continue conversations seamlessly.
- Context Window: Be mindful of the context window limit. Each turn adds tokens to the input.
- Message Order: Messages must alternate between user and assistant roles, starting with user.
Prefill Technique: Putting Words in Claude's Mouth
Prefill allows you to start Claude's response by providing the beginning of its answer. This is useful for:
- Guiding response format (e.g., JSON, multiple choice)
- Enforcing specific phrasing
- Reducing token usage for constrained outputs
Important: Model Support
Prefill is not supported on:
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Mythos Preview
Python Example: Multiple Choice
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1, # Only need one token for the answer
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Outputs: "C"
How Prefill Works
- The assistant message in the last position contains your prefill text.
- Claude continues generating from that point.
- Combined with
max_tokens, you can get very constrained outputs.
Best Practices for Prefill
- Use with
max_tokens: Set a lowmax_tokensvalue to limit Claude's completion. - Natural Continuation: Make the prefill text a natural lead-in to the desired response.
- Fallback Strategy: For unsupported models, use system prompts like "Always respond with a single letter A, B, or C."
Vision Capabilities
Claude can process images through the Messages API. This enables use cases like image analysis, document processing, and visual question answering.
Python Example
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("diagram.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this diagram in detail."
}
]
}
]
)
print(message.content[0].text)
Supported Media Types
image/jpegimage/pngimage/gif(first frame only)image/webp
Image Size Limits
- Maximum image size: 100 MB
- Claude automatically resizes large images to fit its context window
- For best results, use images under 5 MB
Handling Stop Reasons
Understanding why Claude stopped generating helps you build robust applications.
| Stop Reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue conversation |
max_tokens | Output hit token limit | Increase max_tokens or truncate |
stop_sequence | Hit a custom stop sequence | Handle as needed |
tool_use | Claude wants to use a tool | Execute tool and continue |
Python Example: Handling Stop Reasons
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{"role": "user", "content": "Write a 2000-word essay on AI"}
]
)
if message.stop_reason == "max_tokens":
print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "end_turn":
print("Response completed successfully.")
Error Handling
Common API errors and how to handle them:
- 400 Bad Request: Invalid parameters (e.g., prefill on unsupported model)
- 401 Unauthorized: Invalid API key
- 429 Rate Limit: Too many requests; implement exponential backoff
- 500 Internal Server Error: Transient server issue; retry with backoff
Python Example: Retry with Backoff
import anthropic
import time
client = anthropic.Anthropic()
max_retries = 3
for attempt in range(max_retries):
try:
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
break
except anthropic.RateLimitError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
else:
raise
Best Practices
- Manage Context Window: Track token usage and implement conversation summarization for long conversations.
- Use System Prompts: For consistent behavior, use the
systemparameter (not shown here but available). - Handle Streaming: For real-time applications, use streaming to get tokens as they're generated.
- Cache Prompts: For repeated system prompts or large context, use prompt caching to reduce costs.
- Monitor Usage: Track
input_tokensandoutput_tokensfor billing and optimization.
Conclusion
The Messages API is the foundation for building conversational AI with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create sophisticated applications that leverage Claude's full potential. Remember that the API is stateless, so you control the conversation context—giving you maximum flexibility.
Key Takeaways
- The Messages API is stateless; you must send the full conversation history with each request for multi-turn conversations.
- Prefill allows you to guide Claude's responses by providing the beginning of its answer, but it's not supported on all models.
- Vision capabilities enable image analysis by sending base64-encoded images in the content array.
- Always handle stop reasons (
end_turn,max_tokens,tool_use) to build robust applications. - Implement proper error handling with exponential backoff for rate limits and transient errors.