Mastering the Messages API: A Practical Guide to Building Conversations with Claude
Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide teaches you how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques to shape responses, and vision capabilities for image analysis—all with practical Python and TypeScript code examples.
Mastering the Messages API: A Practical Guide to Building Conversations with Claude
Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding how to structure messages is essential. This guide walks you through everything from basic requests to advanced techniques like prefill and vision.
Understanding the Messages API vs. Claude Managed Agents
Before diving in, it's important to know that Anthropic offers two approaches for building with Claude:
- Messages API: Direct model prompting access—you control every aspect of the conversation loop. Best for custom agent loops and fine-grained control.
- Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
Making Your First API Request
The simplest interaction with Claude involves sending a single user message and receiving a response. Here's how it looks in Python:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
The response includes several important fields:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key Response Fields
id: Unique identifier for the messagecontent: Array of content blocks (text, tool_use, etc.)stop_reason: Why Claude stopped generating (end_turn,max_tokens,stop_sequence, ortool_use)usage: Token counts for billing and monitoring
Building Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires careful management.
Basic Multi-Turn Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
Managing Conversation State
Since you control the history, you can:
- Trim old messages to stay within context windows
- Inject system prompts at the beginning
- Add synthetic assistant messages to guide behavior
- Persist conversations across sessions by storing message arrays
Pro Tip: Earlier turns don't need to originate from Claude. You can inject synthetic assistant messages to provide context or correct behavior without waiting for the API.
Putting Words in Claude's Mouth: The Prefill Technique
Prefilling allows you to start Claude's response for it. This is incredibly useful for:
- Constraining output format (e.g., getting a single letter answer)
- Guiding response structure (e.g., starting with "The answer is")
- Reducing token usage for predictable responses
Example: Multiple Choice Answer
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Outputs: "C"
Important Prefill Limitations
- Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6
- These models return a 400 error for prefill requests
- Alternative: Use structured outputs or system prompt instructions instead
- Check the migration guide for patterns
When to Use Prefill vs. System Prompts
| Technique | Best For |
|---|---|
| Prefill | Short, constrained outputs (single tokens, specific formats) |
| System Prompt | General behavior guidance, long instructions |
| Structured Outputs | JSON schemas, complex structured data |
Working with Vision Capabilities
Claude can analyze images sent through the Messages API. This opens up use cases like:
- Document analysis
- Image description
- UI/UX review
- Visual data extraction
Sending an Image
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail"
}
]
}
]
)
print(message.content[0].text)
Supported Image Formats
- JPEG, PNG, GIF, WebP
- Maximum size: ~100MB (base64 encoded)
- Best results with clear, high-contrast images
Handling Stop Reasons
Understanding why Claude stopped generating helps you build robust applications:
| Stop Reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue or end conversation |
max_tokens | Hit token limit | Increase max_tokens or continue |
stop_sequence | Found a stop sequence | Process the response |
tool_use | Claude wants to use a tool | Execute tool and return result |
if message.stop_reason == "max_tokens":
# Continue the conversation to get more output
messages.append({"role": "assistant", "content": message.content[0].text})
messages.append({"role": "user", "content": "Please continue"})
elif message.stop_reason == "tool_use":
# Handle tool calls
pass
Best Practices for Production
1. Manage Context Windows
Claude's context window is large but finite. Implement strategies like:
- Sliding window (keep last N messages)
- Summarization of old conversations
- Prompt caching for frequently used context
2. Handle Errors Gracefully
try:
message = client.messages.create(...)
except anthropic.APIError as e:
print(f"API error: {e}")
# Implement retry logic
except anthropic.APIConnectionError as e:
print(f"Connection error: {e}")
# Queue for retry
except anthropic.RateLimitError as e:
print(f"Rate limited: {e}")
# Implement exponential backoff
3. Monitor Token Usage
Track usage.input_tokens and usage.output_tokens for cost management and optimization.
4. Use Streaming for Real-Time UX
For chat applications, enable streaming to show responses as they're generated:
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Key Takeaways
- The Messages API is stateless—you must send full conversation history with every request, giving you complete control over context management
- Prefill technique lets you start Claude's response, enabling constrained outputs and format control (but check model compatibility)
- Multi-turn conversations require careful history management; you can inject synthetic assistant messages to guide behavior
- Vision capabilities allow image analysis by sending base64-encoded images alongside text prompts
- Always handle stop reasons (
end_turn,max_tokens,tool_use) to build robust, production-ready applications