Mastering the Messages API: Build Multi-Turn Conversations with Claude
Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide covers how to use Claude's Messages API to build conversational apps, including basic requests, multi-turn dialogues, prefill techniques to shape responses, and vision capabilities for image analysis.
Mastering the Messages API: Build Multi-Turn Conversations with Claude
Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generator, or a vision-powered application, understanding how to structure your API calls is essential. This guide walks you through everything from a simple "Hello, Claude" to advanced techniques like prefill and multi-turn conversations.
What Is the Messages API?
The Messages API gives you direct access to Claude's intelligence. You send a list of messages (the conversation history), and Claude responds with a new message. It's stateless—meaning you must send the full conversation history with each request. This design gives you complete control over context and conversation flow.
Anthropic offers two paths for building with Claude:
- Messages API: Direct model access for custom agent loops and fine-grained control.
- Claude Managed Agents: A pre-built, configurable agent harness for long-running tasks.
Basic Request and Response
Let's start with the simplest possible interaction: sending a single message and getting a response.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message);
Understanding the Response
The API returns a structured JSON object:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields:
content: An array of content blocks (usually text).stop_reason: Why the model stopped—"end_turn"means Claude finished naturally.usage: Token counts for billing and context management.
Building Multi-Turn Conversations
Because the Messages API is stateless, you must send the entire conversation history with each request. This makes it easy to build up a conversation over multiple turns.
Example: Two-Turn Conversation
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
Notice that the second turn includes Claude's previous response ("Hello!") as part of the input. This maintains context.
Important Notes
- Synthetic assistant messages: The assistant messages don't have to come from Claude. You can insert pre-written assistant responses to guide the conversation.
- Context window: Be mindful of the total token count. Each turn adds to the context, and you may hit limits with long conversations.
Putting Words in Claude's Mouth (Prefill)
One powerful technique is prefilling—starting Claude's response for it. You include a partial assistant message at the end of your input, and Claude continues from there.
Use Case: Multiple Choice Questions
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Output: "C"
By setting max_tokens=1, you force Claude to output just a single token—the letter of the correct answer. The prefill "The answer is (" shapes the response format.
Other Prefill Applications
- JSON mode: Prefill with
{"response": "to force structured output. - Roleplay: Start Claude's response with a character's name or action.
- Code generation: Prefill with
deforfunctionto get a function definition.
Vision Capabilities
The Messages API also supports image inputs. You can send images as base64-encoded data or as URLs.
Example: Analyze an Image
import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported Image Types
- JPEG, PNG, GIF, WebP
- Maximum size: 100 MB per image
- Claude can analyze images for descriptions, OCR, data extraction, and more.
Streaming Responses
For real-time applications, you can stream Claude's response token by token. This is ideal for chatbots where you want to show the response as it's being generated.
Python Streaming Example
stream = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Tell me a short story."}
],
stream=True
)
for chunk in stream:
if chunk.type == "content_block_delta" and chunk.delta.type == "text_delta":
print(chunk.delta.text, end="")
Streaming reduces perceived latency and improves user experience.
Best Practices
1. Manage Context Window
Each conversation turn adds tokens. For long conversations, use prompt caching or context compaction to stay within limits.
2. Use Prefill for Consistency
When you need structured output (JSON, specific formats), always use prefill to guide Claude's response.
3. Handle Stop Reasons
Check stop_reason in the response:
"end_turn": Claude finished naturally."max_tokens": Response was cut off—increasemax_tokensor continue the conversation."stop_sequence": Claude hit a custom stop sequence.
4. Batch Processing
For high-volume tasks, use the Batch API to send multiple requests asynchronously.
Common Pitfalls
- Forgetting history: Always send the full conversation history, or Claude will lose context.
- Exceeding token limits: Monitor
usage.input_tokensandusage.output_tokensto avoid surprises. - Ignoring errors: Handle API errors (rate limits, authentication) gracefully in production.
Conclusion
The Messages API is the foundation for building any application with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision, you can create powerful, interactive experiences. Start with simple calls, then layer in streaming and advanced techniques as your needs grow.
Key Takeaways
- The Messages API is stateless—always send the full conversation history with each request.
- Prefill allows you to shape Claude's response by starting its message for it.
- Vision capabilities let you send images for analysis alongside text.
- Streaming reduces latency and improves user experience for real-time apps.
- Always check
stop_reasonand token usage to manage conversations effectively.