Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude
Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples in Python and TypeScript.
This guide explains how to use Claude's Messages API to build conversational applications, including sending basic requests, managing multi-turn conversations, using prefill to shape responses, and leveraging vision capabilities.
Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude
Claude's Messages API is the primary interface for integrating Claude into your applications. Whether you're building a chatbot, a content generation tool, or an AI assistant, understanding how to work with messages is essential. This guide walks you through the core patterns—from basic requests to advanced techniques like prefilling and vision—with practical code examples.
Understanding the Messages API vs. Managed Agents
Anthropic offers two paths for building with Claude:
- Messages API: Direct model access with fine-grained control over every request and response. Best for custom agent loops, tool use, and when you need to manage conversation state yourself.
- Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Ideal for long-running tasks and asynchronous work.
Making Your First API Request
Let's start with the simplest possible interaction: sending a single message and receiving a response.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message);
Understanding the Response
The API returns a structured JSON object:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
- content: An array of content blocks (text, tool_use, etc.)
- stop_reason: Why the model stopped (
end_turn,max_tokens,stop_sequence, ortool_use) - usage: Token counts for billing and monitoring
Building Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage state on your end.
Conversation Flow
import anthropic
client = anthropic.Anthropic()
First turn
message1 = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
Second turn: include previous history
message2 = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message2.content[0].text)
Important Notes
- Earlier turns don't need to originate from Claude. You can inject synthetic assistant messages to guide the conversation.
- Always maintain the correct alternating pattern: user → assistant → user → assistant.
- The API validates message order and will reject malformed sequences.
Putting Words in Claude's Mouth: The Prefill Technique
Prefilling allows you to start Claude's response by including an assistant message with partial content at the end of your input. This is incredibly useful for:
- Constraining outputs (e.g., forcing a multiple-choice answer format)
- Guiding tone or style (e.g., starting with "I'd be happy to explain...")
- Ensuring structured responses (e.g., JSON or XML)
Example: Multiple Choice Answer
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is Latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Output: "C"
By setting max_tokens=1 and prefilling with "The answer is (", we force Claude to complete only the letter, giving us a clean, parseable response.
Use Cases for Prefilling
| Scenario | Prefill Example | Benefit |
|---|---|---|
| JSON output | {"response": | Guarantees valid JSON start |
| Code generation | Here's the Python function:\n\ndef | Forces code block format |
| Sentiment analysis | Sentiment: | Ensures consistent labeling |
| Translation | French translation: | Locks output language |
Handling Streaming Responses
For real-time applications, streaming reduces perceived latency. The API supports Server-Sent Events (SSE) for streaming.
Python Streaming Example
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Write a short poem about AI"}
]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
TypeScript Streaming Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const stream = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Write a short poem about AI' }],
stream: true
});
for await (const event of stream) {
if (event.type === 'content_block_delta') {
process.stdout.write(event.delta.text);
}
}
Working with Vision and Images
The Messages API supports image inputs. You can send images as base64-encoded data or via URLs.
Image Analysis Example
import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Best Practices for Production
- Manage token budgets: Always set
max_tokensto control costs and response length. - Handle stop reasons: Check
stop_reasonin responses."max_tokens"means the response was cut off; you may need to continue the conversation. - Implement retry logic: Network issues happen. Use exponential backoff for transient failures.
- Cache frequent prefixes: For common system prompts, use prompt caching to reduce latency and costs.
- Monitor usage: Track
input_tokensandoutput_tokensto stay within your budget.
Key Takeaways
- The Messages API is stateless—you must send the full conversation history with each request, giving you complete control over context.
- Prefilling allows you to shape Claude's responses by providing the beginning of its reply, enabling constrained outputs and structured formats.
- Streaming responses via SSE reduce perceived latency for real-time applications.
- The API supports multi-modal inputs including text and images, making it suitable for vision tasks.
- Always handle
stop_reasonin your application logic to detect truncated responses and manage conversation flow properly.