Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude
Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities. Includes code examples in Python and TypeScript.
This guide teaches you how to use Claude's Messages API to send prompts, manage multi-turn conversations, prefill responses, and handle images. You'll get practical code examples and best practices for building robust AI applications.
Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude
Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a document analyzer, or a creative writing assistant, understanding how to structure requests and handle responses is essential. This guide walks you through the core patterns—from a simple hello to multi-turn conversations, prefill techniques, and vision capabilities.
Understanding the Basics
At its heart, the Messages API is a stateless REST endpoint. You send a list of messages (the conversation history), and Claude returns a new message. Each request is independent, meaning you must include the full context every time.
Basic Request and Response
Here's the simplest possible call—a single user message:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
Response:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
role: Always"assistant"in the response.content: An array of content blocks (text, tool_use, etc.).stop_reason: Why the model stopped—"end_turn"means it finished naturally.usage: Token counts for billing and context management.
Building Multi-Turn Conversations
Since the API is stateless, you must send the entire conversation history with each request. This gives you full control over context.
Example: A Two-Turn Conversation
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
Important: The assistant's previous response ("Hello!") is included verbatim. You can even inject synthetic assistant messages—they don't have to come from Claude. This is useful for:
- Prompting with examples (few-shot learning)
- Guiding the conversation with pre-written assistant turns
- Simulating role-play scenarios
Managing Conversation State
In a real application, you'll store the message history in a database or session. Each time the user sends a new message, you append it to the history and send the entire array. Claude's response is then appended for the next turn.
conversation = [
{"role": "user", "content": "What's the capital of France?"}
]
First turn
response = client.messages.create(model="claude-opus-4-7", max_tokens=1024, messages=conversation)
conversation.append({"role": "assistant", "content": response.content[0].text})
Second turn
conversation.append({"role": "user", "content": "And what is its population?"})
response = client.messages.create(model="claude-opus-4-7", max_tokens=1024, messages=conversation)
Prefilling Claude's Response
One of the most powerful techniques is prefilling—you start Claude's response by including an assistant message with partial content. This shapes the model's output.
Use Case: Forcing a Multiple Choice Answer
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
Response:
{
"content": [{"type": "text", "text": "C"}],
"stop_reason": "max_tokens"
}
By setting max_tokens=1 and prefilling "The answer is (", Claude only needs to output the letter. This is perfect for classification tasks, quizzes, or structured outputs.
When Prefill Is Not Supported
Prefilling is not supported on:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
Other Prefill Patterns
- JSON completion: Prefill with
{"response":to get valid JSON. - Code generation: Prefill with
def calculate_total():to start a function. - Creative writing: Prefill with
"The story begins"to set the tone.
Vision: Sending Images to Claude
Claude can analyze images sent via the Messages API. This unlocks use cases like document scanning, image description, and visual Q&A.
Sending a Base64-Encoded Image
import base64
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
Key points:
- The
contentfield is an array of content blocks. - Supported media types:
image/png,image/jpeg,image/gif,image/webp. - Image size is limited (check the latest docs for limits).
- Claude can extract text, analyze trends, and describe visual elements.
Vision Use Cases
- Document analysis: Extract data from scanned PDFs or screenshots.
- UI testing: Describe what a webpage looks like.
- Medical imaging: Identify features in X-rays or diagrams.
- E-commerce: Generate product descriptions from photos.
Handling Stop Reasons
The stop_reason field tells you why Claude stopped generating. Understanding this helps you handle edge cases:
| stop_reason | Meaning | Action |
|---|---|---|
"end_turn" | Claude finished naturally | Continue conversation |
"max_tokens" | Output hit the token limit | Increase max_tokens or truncate |
"stop_sequence" | A custom stop sequence was hit | Handle as needed |
"tool_use" | Claude wants to call a tool | Execute the tool and return result |
Best Practices
- Always include
max_tokens: Prevents runaway responses and unexpected costs. - Use
systemparameter for instructions: Instead of putting instructions in the user message, use the dedicatedsystemfield for better performance. - Monitor token usage: The
usagefield helps you track costs and optimize context length. - Handle errors gracefully: Network issues, rate limits, and invalid requests should be caught and retried.
- Cache frequent prompts: Use prompt caching for repeated system instructions to reduce latency and cost.
Conclusion
The Messages API is the foundation of any Claude-powered application. By mastering basic requests, multi-turn conversations, prefill, and vision, you can build sophisticated AI experiences. Remember that the API is stateless—you control the context. Use prefill to guide responses, and leverage vision to unlock multimodal capabilities.
Key Takeaways
- Stateless design: You must send the full conversation history with every request; store and append messages manually.
- Prefill shapes output: Starting Claude's response with partial text forces structured answers—great for classification and JSON generation.
- Vision is powerful: Send images as base64 content blocks for document analysis, UI testing, and more.
- Watch stop reasons:
end_turnmeans natural completion,max_tokensmeans you need more capacity, andtool_usetriggers function calling. - Check model compatibility: Prefill is not supported on Opus 4.7, Sonnet 4.6, and Mythos Preview—use structured outputs instead.