Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude
Learn how to use Claude's Messages API for multi-turn conversations, prefill techniques, and vision. Includes code examples in Python and TypeScript.
This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
Introduction
Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generator, or an AI assistant, understanding how to structure your requests and handle responses is essential. This guide walks you through the core patterns of the Messages API, from simple one-shot queries to complex multi-turn conversations and advanced techniques like prefill and vision.
Understanding the Messages API
The Messages API is stateless—each request must include the full conversation history. This design gives you complete control over the context Claude sees, making it ideal for custom agent loops and fine-grained control over interactions.
Basic Request and Response
A minimal request requires three things: a model name, a max_tokens limit, and an array of messages. Here's how it looks in Python:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message.content[0].text)
The response includes useful metadata:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
- stop_reason: Indicates why the response ended (
end_turn,max_tokens,stop_sequence, ortool_use). - usage: Token counts for billing and context management.
Building Multi-Turn Conversations
Since the API is stateless, you must send the entire conversation history with each request. This pattern lets you build up context over multiple turns.
import anthropic
client = anthropic.Anthropic()
conversation = [
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=conversation
)
print(message.content[0].text)
Synthetic Assistant Messages
You're not limited to real conversations. You can inject synthetic assistant messages to guide Claude's behavior. For example, you might pre-populate a conversation with a persona or context:
conversation = [
{"role": "user", "content": "You are a helpful math tutor. Start by asking me a question."},
{"role": "assistant", "content": "Sure! Let's start with algebra. What is 2x + 3 = 7?"},
{"role": "user", "content": "x = 2"}
]
This is particularly useful for:
- Setting up role-playing scenarios
- Providing few-shot examples
- Maintaining character consistency
Prefill: Putting Words in Claude's Mouth
Prefill allows you to start Claude's response for it. You include an assistant message at the end of your input with partial content, and Claude completes it. This is powerful for constraining outputs.
Example: Multiple Choice Answer
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Outputs: "C"
By setting max_tokens=1, you force Claude to output just the letter. The prefill "The answer is (" guides the model to complete the pattern.
Important Limitations
Prefill is not supported on these models:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
Use Cases for Prefill
- Constrained generation: Force JSON prefixes or specific formats
- Chain-of-thought: Start with "Let me think step by step:" to encourage reasoning
- Classification: Prefill with a category label
- Completion tasks: Provide the beginning of a sentence or code block
Vision: Working with Images
Claude can analyze images sent through the Messages API. This enables use cases like image captioning, document analysis, and visual Q&A.
Sending an Image
Images are sent as base64-encoded data in the content array:
import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported Image Formats
- PNG
- JPEG
- WebP
- GIF (static, first frame only)
Best Practices for Vision
- Use appropriate resolution: Claude works best with images between 200x200 and 2048x2048 pixels
- Compress when possible: Smaller file sizes reduce latency
- Combine with text: Always include a text prompt to guide Claude's analysis
- One image per message: For complex scenes, send one image at a time
Handling Stop Reasons
Understanding why Claude stopped generating helps you build robust applications:
| stop_reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue or end conversation |
max_tokens | Output hit the token limit | Increase max_tokens or truncate |
stop_sequence | A custom stop sequence was hit | Handle as needed |
tool_use | Claude wants to call a tool | Execute tool and send result back |
if message.stop_reason == "max_tokens":
print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "tool_use":
print("Claude requested a tool call.")
Error Handling and Best Practices
Common Errors
- 400 Bad Request: Invalid parameters or unsupported prefill model
- 401 Unauthorized: Invalid API key
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Temporary server issue
Retry Strategy
import time
from anthropic import Anthropic, APIError
client = Anthropic()
def make_request_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=messages
)
except APIError as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)
Token Management
- Monitor
usage.input_tokensandusage.output_tokensto stay within limits - Use prompt caching for repeated system prompts
- Consider compaction for long conversations
Conclusion
The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision, you can create sophisticated AI applications. Remember that the API is stateless—you control the context. Use prefill wisely (avoiding unsupported models), handle stop reasons appropriately, and always monitor token usage.
Key Takeaways
- Stateless design: Always send the full conversation history; you control the context Claude sees.
- Prefill is powerful but limited: Use it to constrain outputs, but avoid models that don't support it (Opus 4.7, Sonnet 4.6, etc.).
- Vision is straightforward: Send base64-encoded images with a text prompt for analysis.
- Handle stop reasons:
end_turn,max_tokens, andtool_useeach require different responses. - Monitor token usage: Track input and output tokens to manage costs and stay within limits.