Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision
Learn how to build with Claude using the Messages API. Covers basic requests, multi-turn conversations, prefill techniques, and vision capabilities with code examples.
This guide teaches you how to use the Claude Messages API for basic requests, multi-turn conversations, prefill to shape responses, and vision capabilities to analyze images.
Introduction
The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generator, or a vision-powered application, understanding the Messages API is essential. This guide walks you through the most common patterns: basic requests, multi-turn conversations, prefill techniques, and vision capabilities.
Basic Request and Response
At its simplest, a Messages API call sends a user message and receives Claude's response. Here's a minimal example in Python:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
The response includes:
id: A unique message identifiercontent: An array of content blocks (usually text)stop_reason: Why the generation stopped (end_turn,max_tokens,stop_sequence, etc.)usage: Token counts for input and output
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {"input_tokens": 12, "output_tokens": 6}
}
Multi-Turn Conversations
The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context but requires you to manage state on your end.
Building a Conversation
To continue a conversation, append new messages to the messages array:
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
Synthetic Assistant Messages
You can inject synthetic assistant messages — they don't need to be actual Claude responses. This is useful for:
- Setting up a scenario or persona
- Providing example interactions (few-shot prompting)
- Guiding Claude's behavior without system prompts
messages = [
{"role": "user", "content": "What's the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What about Italy?"}
]
Prefill: Putting Words in Claude's Mouth
Prefill allows you to start Claude's response by including an assistant message at the end of your input. This shapes the output — Claude will continue from where you left off.
Use Case: Multiple Choice
A common pattern is using prefill with max_tokens=1 to get a single-character answer:
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
Claude will complete the response with C, giving you a clean, parseable answer.
Important Notes
- Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Using it with these models returns a 400 error.
- For unsupported models, use structured outputs or system prompt instructions instead.
- Prefill works best for short, constrained outputs like classifications or single tokens.
Vision: Analyzing Images
Claude can analyze images sent via the Messages API. This enables use cases like document analysis, image description, and visual Q&A.
Sending an Image
Images are sent as base64-encoded data in a content block:
import base64
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
Supported Image Formats
- JPEG, PNG, GIF, WebP
- Maximum size: 100 MB per image
- Claude processes images at varying resolutions; larger images may be downscaled
Best Practices for Vision
- Combine with text: Always include a text prompt alongside the image to guide Claude's analysis.
- Use high-quality images: Blurry or low-resolution images reduce accuracy.
- Be specific: Instead of "What's in this image?", ask "What are the quarterly sales trends shown in this bar chart?"
- Consider token cost: Images consume input tokens based on their resolution. A 1024x1024 image uses roughly 1,500 tokens.
Handling Stop Reasons
Claude's response includes a stop_reason field that tells you why generation stopped:
| Stop Reason | Meaning |
|---|---|
end_turn | Claude finished naturally |
max_tokens | Output hit the max_tokens limit |
stop_sequence | Claude encountered a custom stop sequence |
tool_use | Claude wants to call a tool (if tools are enabled) |
max_tokens, you can continue the conversation by appending Claude's partial response and asking it to continue:
# If stop_reason is "max_tokens", continue the conversation
if message.stop_reason == "max_tokens":
messages.append({"role": "assistant", "content": message.content[0].text})
messages.append({"role": "user", "content": "Please continue."})
# Make another API call
Streaming Responses
For real-time applications, use streaming to receive tokens as they're generated:
stream = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for event in stream:
if event.type == "content_block_delta":
print(event.delta.text, end="", flush=True)
Streaming is ideal for chatbots, live transcription, and any UI that shows incremental progress.
Error Handling
Common API errors and how to handle them:
- 400 Bad Request: Invalid parameters (e.g., unsupported model with prefill)
- 401 Unauthorized: Invalid API key
- 429 Too Many Requests: Rate limit exceeded — implement exponential backoff
- 529 Overloaded: Temporary server overload — retry with backoff
import time
import random
def call_with_retry(client, **kwargs):
max_retries = 5
for attempt in range(max_retries):
try:
return client.messages.create(**kwargs)
except anthropic.RateLimitError:
wait = 2 ** attempt + random.uniform(0, 1)
time.sleep(wait)
raise Exception("Max retries exceeded")
Conclusion
The Messages API is the foundation for all Claude integrations. By mastering basic requests, multi-turn conversations, prefill, and vision, you can build sophisticated applications that leverage Claude's full capabilities. Remember that the API is stateless — manage conversation history on your end — and always handle stop reasons and errors gracefully.
Key Takeaways
- The Messages API is stateless — you must send the full conversation history with every request, giving you complete control over context.
- Prefill lets you shape responses by starting Claude's reply, but it's not supported on all models (use structured outputs as an alternative).
- Vision capabilities allow Claude to analyze images sent as base64 data; always pair images with specific text prompts for best results.
- Streaming provides real-time token delivery, ideal for interactive applications.
- Handle stop reasons like
max_tokensto gracefully continue interrupted responses.