Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude
Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities. Includes code examples in Python and TypeScript.
This guide teaches you how to use Claude's Messages API to build conversational AI applications. You'll learn to make basic requests, manage multi-turn conversations, prefill Claude's responses, and use vision capabilities—all with practical code examples.
Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude
Claude's Messages API is the direct, programmatic way to interact with Anthropic's most powerful language models. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding the Messages API is essential. This guide walks you through everything you need to know—from basic requests to advanced techniques like prefill and vision.
What is the Messages API?
The Messages API gives you direct access to Claude's prompting capabilities. Unlike the managed agent approach (which handles long-running tasks in pre-built infrastructure), the Messages API is ideal for custom agent loops, fine-grained control, and real-time interactions.
Key characteristics:- Stateless: You must send the full conversation history with each request.
- Flexible: Supports text, images, and structured outputs.
- Efficient: Eligible for Zero Data Retention (ZDR) arrangements.
Making Your First Request
Let's start with the simplest possible interaction: sending a single message to Claude and receiving a response.
Basic Request (Python)
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
Basic Request (TypeScript)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message);
Understanding the Response
The API returns a structured JSON object. Here's what you'll see:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
content: An array of content blocks (usually text).stop_reason: Why the model stopped ("end_turn","max_tokens","stop_sequence", or"tool_use").usage: Token counts for billing and optimization.
Building Multi-Turn Conversations
Because the Messages API is stateless, you must maintain conversation history yourself. Each request should include the entire message history.
Example: Two-Turn Conversation
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
Important: The assistant's previous response ("Hello!") doesn't have to come from Claude. You can inject synthetic assistant messages to guide the conversation or simulate context.
Managing Conversation State
For production applications, store the message array in a database or session store. Append each new user message and assistant response to the array before making the next API call.
conversation = [
{"role": "user", "content": "What is the capital of France?"}
]
First turn
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=conversation
)
conversation.append({"role": "assistant", "content": response.content[0].text})
Second turn
conversation.append({"role": "user", "content": "And what is its population?"})
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=conversation
)
Prefilling Claude's Response
One of the most powerful features of the Messages API is prefilling—you can start Claude's response by including an assistant message with partial content. This is useful for:
- Guiding the format of the response
- Forcing multiple-choice answers
- Providing a starting template
Example: Forcing a Multiple-Choice Answer
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Output: "C"
By setting max_tokens=1 and prefilling with "The answer is (", Claude only generates a single token—the letter of the correct answer.
Prefill Limitations
Important: Prefilling is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.
Vision Capabilities
Claude can analyze images sent through the Messages API. This opens up use cases like:
- Image captioning and description
- Document analysis (receipts, forms, charts)
- Visual question answering
Sending an Image (Python)
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode the image
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported image formats: JPEG, PNG, GIF, WebP. Images are resized and compressed automatically to fit within Claude's context window.
Handling Stop Reasons
Understanding why Claude stopped generating helps you build robust applications:
| Stop Reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Display the response |
max_tokens | Output length limit reached | Increase max_tokens or continue the conversation |
stop_sequence | A custom stop sequence was hit | Handle based on your application logic |
tool_use | Claude wants to use a tool | Execute the tool and return results |
Best Practices
- Optimize token usage: Monitor
usage.input_tokensandusage.output_tokensto control costs. - Handle errors gracefully: Implement retry logic for transient failures (rate limits, network issues).
- Use system prompts: For persistent instructions, use the
systemparameter instead of repeating instructions in every user message. - Stream responses: For better user experience, enable streaming to show tokens as they're generated.
Conclusion
The Messages API is the foundation for building custom AI applications with Claude. By mastering basic requests, multi-turn conversations, prefilling, and vision, you can create sophisticated conversational experiences tailored to your specific use case.
Key Takeaways
- Stateless design: Always send the full conversation history with each API request. Store and manage conversation state on your end.
- Prefill for control: Use prefilling to guide Claude's responses, especially for structured outputs or multiple-choice scenarios—but check model compatibility first.
- Vision is powerful: Claude can analyze images alongside text, enabling document analysis, visual Q&A, and more.
- Watch stop reasons: The
stop_reasonfield tells you why Claude stopped, helping you handle edge cases like token limits or tool calls. - Optimize costs: Monitor token usage and use streaming for real-time applications to improve user experience and reduce perceived latency.