Mastering the Messages API: A Practical Guide to Building with Claude
Learn how to use Claude's Messages API for multi-turn conversations, response prefilling, and vision capabilities. Includes code examples and best practices for developers.
This guide teaches you how to use Claude's Messages API for basic requests, multi-turn conversations, response prefilling, and vision capabilities with practical code examples in Python and TypeScript.
Mastering the Messages API: A Practical Guide to Building with Claude
Anthropic offers two primary ways to build with Claude: the Messages API for direct model access and Claude Managed Agents for pre-built agent harnesses. This guide focuses on the Messages API—the foundation for custom agent loops, fine-grained control, and integrating Claude into your applications.
Whether you're building a chatbot, content generator, or vision-powered tool, understanding the Messages API is essential. Let's dive into the patterns that will help you get the most out of Claude.
Understanding the Messages API
The Messages API is a stateless, RESTful interface that lets you send conversational turns to Claude and receive responses. Unlike some chat APIs that maintain session state, you must send the full conversation history with every request. This design gives you complete control over context management.
Basic Request Structure
Every request to the Messages API requires three core parameters:
model: The Claude model identifier (e.g.,claude-opus-4-7,claude-sonnet-4-5)max_tokens: Maximum tokens in the responsemessages: An array of message objects withroleandcontent
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
The response includes the model's reply, usage statistics, and a stop_reason indicating why generation ended:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Building Multi-Turn Conversations
Since the Messages API is stateless, you build conversations by appending each turn to the messages array. This pattern allows you to maintain context across multiple exchanges.
The Conversation Loop Pattern
import anthropic
client = anthropic.Anthropic()
Start with the initial user message
messages = [
{"role": "user", "content": "What are the three primary colors?"}
]
First API call
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=messages
)
Append Claude's response to history
messages.append({"role": "assistant", "content": response.content[0].text})
Add the next user turn
messages.append({"role": "user", "content": "Can you mix them to make secondary colors?"})
Second API call with full history
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=messages
)
print(response.content[0].text)
Synthetic Assistant Messages
A powerful feature: earlier assistant turns don't need to come from Claude. You can inject synthetic assistant messages to guide the conversation or provide context that Claude didn't generate:
messages = [
{"role": "user", "content": "Summarize our previous discussion about project timelines."},
{"role": "assistant", "content": "Based on our discussion, the project has three phases: research (weeks 1-2), development (weeks 3-6), and testing (weeks 7-8)."},
{"role": "user", "content": "What are the key milestones for the development phase?"}
]
This is particularly useful for:
- Injecting system-generated context
- Simulating conversation history from other sources
- Providing structured data summaries
Putting Words in Claude's Mouth: Prefill Technique
The prefill technique lets you start Claude's response by including assistant content in the input messages. This shapes the model's output by providing a starting point.
Use Case: Multiple Choice Questions
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1, # Only need one token for the answer
messages=[
{
"role": "user",
"content": "What is the capital of France?\nA) London\nB) Paris\nC) Berlin\nD) Madrid\n\nAnswer:"
},
{
"role": "assistant",
"content": "B" # Prefill the answer
}
]
)
print(message.content[0].text) # Output: "B"
Important Prefill Limitations
Prefilling is not supported on these models:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
When to Use Prefill
- Constrained outputs: Force Claude to start with a specific format (JSON, YAML, etc.)
- Multiple choice: Get single-token answers for classification tasks
- Controlled generation: Guide the tone or direction of the response
- Chain-of-thought prompting: Start Claude's reasoning process
Vision Capabilities: Sending Images to Claude
Claude can analyze images alongside text. You can supply images using three source types:
base64: Base64-encoded image dataurl: Publicly accessible image URLfile: Image uploaded via the Files API
Supported Image Formats
| Format | MIME Type |
|---|---|
| JPEG | image/jpeg |
| PNG | image/png |
| GIF | image/gif |
| WebP | image/webp |
Example: Analyzing an Image from URL
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/ant-photo.jpg"
}
}
]
}
]
)
print(message.content[0].text)
Output: "This image shows an ant, specifically a close-up view..."
Example: Using Base64 Images
import anthropic
import base64
client = anthropic.Anthropic()
with open("photo.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in detail."
},
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_data
}
}
]
}
]
)
Handling Stop Reasons
Every response includes a stop_reason field that tells you why Claude stopped generating. Understanding these helps you build robust applications:
| Stop Reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue or end conversation |
max_tokens | Hit the token limit | Increase max_tokens or truncate response |
stop_sequence | Found a custom stop sequence | Handle based on your logic |
tool_use | Claude wants to use a tool | Process the tool call and continue |
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=100,
messages=[{"role": "user", "content": "Write a 500-word essay"}]
)
if response.stop_reason == "max_tokens":
print("Response was truncated. Consider increasing max_tokens.")
elif response.stop_reason == "end_turn":
print("Response completed successfully.")
Best Practices for the Messages API
1. Manage Token Usage
Monitor the usage field in responses to track costs and optimize your prompts:
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
2. Use Prompt Caching for Long Histories
For conversations with extensive context, enable prompt caching to reduce costs and latency on repeated prefixes.
3. Handle Errors Gracefully
Always implement retry logic with exponential backoff for API errors:
import time
from anthropic import Anthropic, APIError
client = Anthropic()
max_retries = 3
for attempt in range(max_retries):
try:
response = client.messages.create(...)
break
except APIError as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)
4. Stream Responses for Better UX
For long responses, use streaming to show tokens as they're generated:
with client.messages.stream(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Key Takeaways
- Stateless by design: Always send the full conversation history with each request. This gives you complete control over context management.
- Prefill shapes responses: Use synthetic assistant messages to guide Claude's output, but be aware of model limitations—prefill is not supported on Opus 4.7, Sonnet 4.6, and others.
- Vision is straightforward: Send images via base64, URL, or file reference using the
imagecontent type block. Supported formats are JPEG, PNG, GIF, and WebP. - Monitor stop reasons: The
stop_reasonfield tells you why generation ended—use it to handle truncation, tool calls, or natural completion. - Stream for better UX: Use streaming for real-time token display, especially for long responses or chat applications.