Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision
Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide covers how to send basic requests, build multi-turn conversations, prefill Claude's responses, and use vision capabilities with the Claude Messages API, including Python and TypeScript code examples.
Introduction
The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding how to work with messages is essential. This guide walks you through the core patterns: basic requests, multi-turn conversations, prefill techniques, and vision capabilities.
Basic Request and Response
At its simplest, the Messages API accepts a list of messages and returns Claude's response. Here's a minimal example in Python:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
The response includes:
- id: Unique message identifier
- role: Always "assistant"
- content: Array of content blocks (usually text)
- model: The model used
- stop_reason: Why generation stopped (
end_turn,max_tokens,stop_sequence, ortool_use) - usage: Token counts for input and output
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {"input_tokens": 12, "output_tokens": 6}
}
Multi-Turn Conversations
The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context.
Building a Conversation
To continue a conversation, append both the assistant's previous response and the new user message:
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
Synthetic Assistant Messages
You can inject synthetic assistant messages — they don't need to have come from Claude. This is useful for:
- Few-shot prompting: Show Claude examples of desired behavior
- Guiding tone: Set the style of responses
- Context injection: Provide information as if Claude already said it
messages = [
{"role": "user", "content": "Explain quantum computing"},
{"role": "assistant", "content": "Quantum computing uses qubits..."}, # synthetic
{"role": "user", "content": "Give me a simple analogy"}
]
Managing Context Windows
Be mindful of the context window. Each turn adds tokens. For long conversations:
- Use prompt caching to reduce costs on repeated system messages
- Implement context compaction to summarize earlier turns
- Consider sliding window approaches for very long histories
Prefill: Putting Words in Claude's Mouth
Prefill allows you to start Claude's response by providing the beginning of its answer. This is powerful for:
- Constraining output format (e.g., JSON, multiple choice)
- Guiding reasoning (e.g., "Let me think step by step")
- Ensuring specific phrasing
Basic Prefill Example
Here's how to get a single letter answer from a multiple-choice question:
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # "C"
Prefill for Structured Output
You can use prefill to force JSON output:
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=200,
messages=[
{
"role": "user",
"content": "Extract the name and age from: 'John is 30 years old'"
},
{
"role": "assistant",
"content": "Here is the JSON: {\"name\": \""
}
]
)
Important Limitations
- Not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6
- These models return a 400 error for prefill requests
- Use structured outputs or system prompt instructions instead
- See the migration guide for alternatives
Vision Capabilities
The Messages API supports images. You can send images as base64-encoded data or via URL.
Sending an Image
import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail"
}
]
}
]
)
print(message.content[0].text)
Supported Image Types
- JPEG, PNG, GIF, WebP
- Maximum size: 100 MB (but larger images are resized)
- Optimal resolution: 1568x1568 pixels or less
Handling Stop Reasons
Understanding why Claude stopped helps you build robust applications:
| stop_reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue conversation |
max_tokens | Hit token limit | Increase max_tokens or truncate |
stop_sequence | Custom stop sequence triggered | Handle as designed |
tool_use | Claude wants to use a tool | Execute tool and return result |
if message.stop_reason == "max_tokens":
print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "tool_use":
# Execute tool calls
for block in message.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
# Add result to conversation
Best Practices
1. Manage Token Usage
- Use
max_tokensto control response length - Monitor
usage.input_tokensandusage.output_tokensfor cost tracking - Implement prompt caching for repeated system messages
2. Handle Errors Gracefully
- Rate limits: Implement exponential backoff
- 400 errors: Check model compatibility (especially with prefill)
- Timeouts: Set appropriate timeouts for long generations
3. Optimize for Your Use Case
- Chatbots: Use multi-turn patterns with history management
- Content generation: Use prefill for consistent formatting
- Data extraction: Combine prefill with low
max_tokens - Vision tasks: Resize images to optimal resolution before sending
4. Security Considerations
- The Messages API is eligible for Zero Data Retention (ZDR)
- When ZDR is enabled, data is not stored after the API response
- Never send sensitive information in prompts unless you have appropriate agreements
Conclusion
The Claude Messages API provides a flexible foundation for building AI-powered applications. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create sophisticated interactions that leverage Claude's full potential.
Key Takeaways
- The Messages API is stateless — always send the full conversation history with each request
- Prefill gives you control over Claude's response format and content, but check model compatibility
- Vision capabilities allow you to send images alongside text for multimodal analysis
- Handle stop reasons appropriately to build robust applications that respond to truncation, tool use, and natural endings
- Monitor token usage to manage costs and optimize context window utilization