Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision
Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide teaches you how to use the Claude Messages API to build conversational applications, including sending basic requests, managing multi-turn conversations, using prefill to shape responses, and leveraging vision capabilities with images.
Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision
Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, an AI assistant, or an automated content generator, understanding how to structure requests and handle responses is essential. This guide covers the most common patterns you'll use when working with the Messages API, from simple queries to advanced techniques like prefill and vision.
Understanding the Basics
The Messages API is stateless—each request must include the full conversation history. This design gives you complete control over the context and allows for flexible conversation management. Every request requires three key components:
model: The Claude model you want to use (e.g.,claude-opus-4-7,claude-sonnet-4-5)max_tokens: The maximum number of tokens Claude can generate in the responsemessages: An array of message objects representing the conversation history
Basic Request and Response
Here's the simplest possible request—a single user message asking for a greeting:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
The response includes the model's reply along with metadata:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
content: An array of content blocks (usually text, but can include tool use or thinking blocks)stop_reason: Why Claude stopped generating (e.g.,"end_turn","max_tokens","stop_sequence")usage: Token counts for billing and context management
Building Multi-Turn Conversations
Since the API is stateless, you must send the entire conversation history with each request. This allows you to build up a conversation over multiple turns:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
Important: The assistant messages don't have to come from Claude—you can inject synthetic assistant messages to guide the conversation or provide context. This is useful for:
- Setting up scenarios
- Providing examples of desired behavior
- Simulating previous interactions
Prefill: Putting Words in Claude's Mouth
Prefill is a powerful technique where you start Claude's response by including an assistant message with partial content at the end of your messages array. Claude will continue from where you left off.
Use Case: Multiple Choice Questions
A classic use case is getting a single-letter answer from a multiple choice question:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Output: "C"
By setting max_tokens=1, you force Claude to output only the next token, which in this case is the letter "C". This pattern is excellent for classification tasks, quizzes, or any scenario requiring constrained output.
Prefill Limitations
Note that prefill is not supported on certain models:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
Vision: Working with Images
The Messages API supports image inputs, enabling Claude to analyze and describe visual content. Images are sent as base64-encoded data in the content array:
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode an image file
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
The media_type should match the image format—supported types include image/png, image/jpeg, image/gif, and image/webp. You can mix images and text in the same message, allowing for rich interactions like "What's in this photo?" or "Read the text from this document."
Handling Stop Reasons
Understanding why Claude stopped generating is crucial for building robust applications. The stop_reason field in the response tells you:
| Stop Reason | Meaning |
|---|---|
"end_turn" | Claude finished naturally |
"max_tokens" | Response hit the token limit |
"stop_sequence" | Claude encountered a custom stop sequence |
"tool_use" | Claude wants to call a tool |
"max_tokens", you may need to increase max_tokens or continue the conversation with a follow-up request.
Best Practices
- Manage context windows carefully: Since you send the full history, keep track of token usage to avoid hitting limits. Use the
usagefield in responses to monitor consumption.
- Use system prompts for instructions: For general behavior guidance, use the
systemparameter rather than injecting instructions into user messages.
- Leverage streaming for real-time applications: The API supports streaming responses, which is ideal for chat interfaces where you want to show tokens as they're generated.
- Handle errors gracefully: The API may return errors for invalid requests, rate limits, or server issues. Always implement retry logic with exponential backoff.
Key Takeaways
- The Messages API is stateless—always send the full conversation history with each request
- Prefill allows you to start Claude's response, enabling constrained outputs like multiple choice answers (but check model compatibility)
- Vision capabilities let you send images alongside text for multimodal analysis
- Monitor
stop_reasonto understand why Claude stopped and handle edge cases like hitting token limits - Use synthetic assistant messages to guide conversations or provide context without requiring real Claude responses