Mastering the Messages API: Build Multi-Turn Conversations with Claude
Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide covers the Claude Messages API, including how to send basic requests, build multi-turn conversations, prefill Claude's responses, and use vision capabilities with practical Python and TypeScript examples.
Mastering the Messages API: Build Multi-Turn Conversations with Claude
Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, an agent, or a content generation tool, understanding how to structure requests and handle responses is essential. This guide walks you through the core patterns—from simple requests to advanced techniques like prefilling and vision.
Understanding the Messages API vs. Managed Agents
Anthropic offers two paths for building with Claude:
- Messages API: Direct access to the model. You control every aspect of the conversation loop. Best for custom agents, fine-grained control, and real-time interactions.
- Claude Managed Agents: A pre-built, configurable agent harness that runs on managed infrastructure. Ideal for long-running, asynchronous tasks.
Basic Request and Response
At its simplest, you send a list of messages to Claude and receive a response. Here's a minimal example using the Python SDK:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
Response:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
content: An array of content blocks. Each block has atype(e.g.,text) and the actual content.stop_reason: Why the response ended. Common values:"end_turn"(Claude finished naturally),"max_tokens"(hit the token limit),"stop_sequence"(encountered a custom stop sequence).usage: Token counts for billing and debugging.
Building Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over context.
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"},
]
)
print(message.content[0].text)
Important: Earlier turns don't have to come from Claude. You can inject synthetic assistant messages to guide the conversation or provide context. This is useful for:
- Simulating a persona: Pre-fill a character's backstory.
- Providing examples: Show Claude how you want it to respond.
- Correcting course: Insert a corrected assistant message to steer the conversation.
Putting Words in Claude's Mouth (Prefilling)
One of the most powerful techniques is prefilling—you start Claude's response in the last position of the messages array. This shapes the output, enforces structure, or constrains answers.
Example: Multiple Choice Answer
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Output: "C"
By setting max_tokens=1 and prefilling "The answer is (", you force Claude to complete with a single character—perfect for classification tasks.
Use Cases for Prefilling
- Structured output: Start with
{"name": "to get JSON-like responses. - Roleplay: Begin with a character's dialogue to set tone.
- Code generation: Prefill with
defto get a function definition. - Chain-of-thought: Start with
"Let's think step by step:"to encourage reasoning.
Working with Vision (Image Input)
Claude can analyze images when you include them in the content array. You provide images as base64-encoded data or via a URL.
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported media types: image/jpeg, image/png, image/gif, image/webp.
Tips for vision:
- Keep images under 20MB.
- Use clear, high-resolution images for best results.
- Combine with text instructions for precise analysis.
Handling Stop Reasons
Always check the stop_reason field to understand why Claude stopped:
| Stop Reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue or end conversation |
max_tokens | Hit token limit | Increase max_tokens or truncate history |
stop_sequence | Encountered custom stop sequence | Handle as needed |
tool_use | Claude wants to call a tool | Process tool call and continue |
if message.stop_reason == "max_tokens":
print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "end_turn":
print("Claude finished naturally.")
Streaming Responses
For real-time applications, use streaming to receive tokens as they're generated:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Streaming is essential for chatbots and any UI where you want to show progress.
Best Practices
- Manage context windows: Keep conversation history within Claude's context limit (200K tokens for most models). Use prompt caching for repetitive prefixes.
- Use system prompts: For persistent instructions, use the
systemparameter instead of repeating in every user message. - Handle errors gracefully: Implement retries with exponential backoff for rate limits and server errors.
- Monitor token usage: Track
usage.input_tokensandusage.output_tokensto optimize costs. - Prefill strategically: Use prefilling to enforce output format, but avoid over-constraining Claude's creativity.
Key Takeaways
- The Messages API is stateless—send full conversation history with each request for multi-turn interactions.
- Prefilling lets you shape Claude's response by starting its output, useful for structured data and constrained tasks.
- Vision support allows Claude to analyze images via base64 or URL, enabling multimodal applications.
- Always check
stop_reasonto handle truncation, tool calls, or natural endings appropriately. - Streaming is crucial for responsive UIs; use the SDK's streaming methods for real-time token delivery.