Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude
Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples in Python and TypeScript.
This guide teaches you how to use Claude's Messages API to build conversational AI applications. You'll learn basic requests, multi-turn conversations, prefill techniques to shape responses, and how to handle images with vision capabilities.
Introduction
Claude's Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent, understanding how to structure and send messages is essential. This guide covers everything from basic requests to advanced techniques like prefill and vision.
Anthropic offers two paths for building with Claude: the Messages API for direct model access and fine-grained control, and Claude Managed Agents for pre-built, configurable agent harnesses. This guide focuses on the Messages API, which is ideal for custom agent loops and applications requiring precise control over the conversation flow.
Basic Request and Response
At its core, the Messages API is straightforward. You send a list of messages, and Claude responds with a completion. Here's a minimal example using the Python SDK:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
The response includes the model's reply, metadata, and token usage:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
- stop_reason: Indicates why the response ended (
"end_turn"means Claude finished naturally). - usage: Tracks input and output tokens for billing and context management.
Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with each request. This design gives you complete control over context but requires careful management of message arrays.
To build a multi-turn conversation, simply append new messages to the history:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
Important Notes
- The conversation history doesn't need to originate from Claude. You can inject synthetic assistant messages to guide behavior.
- Always alternate between
"user"and"assistant"roles. Two consecutive messages with the same role will cause an error. - For long conversations, consider using prompt caching to reduce costs and latency.
Prefill Technique: Putting Words in Claude's Mouth
One of the most powerful features of the Messages API is prefilling—providing the beginning of Claude's response in the last position of the input messages list. This shapes the model's output without using system prompts.
Use Case: Multiple Choice Answers
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Output: "C"
By setting max_tokens=1 and prefilling with "The answer is (", Claude is forced to complete the response with a single character—the correct letter.
Limitations
Prefilling is not supported on the following models:- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
Alternative: Structured Outputs
For models that don't support prefill, you can use structured outputs to enforce response formats:import anthropic
from pydantic import BaseModel
class QuizAnswer(BaseModel):
answer: str
explanation: str
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "What is latin for Ant? Options: A) Apoidea, B) Rhopalocera, C) Formicidae"}
],
tools=[
{
"name": "quiz_answer",
"description": "Provide the answer and explanation",
"input_schema": QuizAnswer.model_json_schema()
}
],
tool_choice={"type": "tool", "name": "quiz_answer"}
)
print(message.content[0].input) # Output: {"answer": "C", "explanation": "..."}
Vision: Working with Images
Claude can analyze images sent via the Messages API. This is useful for applications like document analysis, visual Q&A, and content moderation.
Sending an Image
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode the image
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported Image Formats
- JPEG
- PNG
- GIF
- WebP
Best Practices for Vision
- Use high-resolution images when details matter (Claude supports up to 8K resolution).
- Combine images with text for best results. A descriptive prompt helps Claude focus on relevant details.
- Be mindful of token costs: Images consume tokens based on their resolution. Larger images cost more.
Handling Stop Reasons
Understanding why Claude stopped generating is crucial for building robust applications. The stop_reason field in the response can have these values:
| stop_reason | Meaning |
|---|---|
"end_turn" | Claude finished naturally |
"max_tokens" | Response reached the token limit |
"stop_sequence" | A custom stop sequence was encountered |
"tool_use" | Claude wants to call a tool |
Example: Handling max_tokens
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=50,
messages=[
{"role": "user", "content": "Write a 1000-word essay on AI safety."}
]
)
if message.stop_reason == "max_tokens":
print("Response was truncated. Consider increasing max_tokens or continuing the conversation.")
Streaming Responses
For real-time applications, you can stream responses token by token:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Tell me a story about a brave robot."}
]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Streaming is ideal for chat interfaces where you want to show responses as they're generated.
Conclusion
The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create powerful AI applications tailored to your specific needs.
Remember these key points:
- Always send the full conversation history (stateless design)
- Use prefill to shape responses, but check model compatibility
- Handle stop reasons to manage conversation flow
- Stream responses for better user experience
- Leverage vision for multimodal applications
Key Takeaways
- The Messages API is stateless: You must send the full conversation history with each request, giving you complete control over context.
- Prefill shapes responses: By providing the beginning of Claude's response, you can enforce formats like multiple-choice answers, but this technique isn't supported on all models.
- Vision enables multimodal AI: Claude can analyze images (JPEG, PNG, GIF, WebP) sent via the API, opening up document analysis and visual Q&A use cases.
- Streaming improves UX: Use the streaming API for real-time token-by-token responses in chat interfaces.
- Handle stop reasons: Always check
stop_reasonto determine if a response was truncated or if Claude needs to use a tool.