Mastering the Messages API: Build Multi-Turn Conversations with Claude
Learn how to use the Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide covers the Messages API for building conversational AI apps with Claude, including stateless multi-turn conversations, prefill techniques to shape responses, and vision support for image analysis.
Mastering the Messages API: Build Multi-Turn Conversations with Claude
Claude's Messages API is the primary interface for building conversational AI applications. Whether you're creating a chatbot, a code assistant, or a document analysis tool, understanding how to structure requests and manage conversation state is essential.
This guide walks you through the core patterns of the Messages API—from basic requests to advanced techniques like prefill and vision—with practical code examples you can use immediately.
Understanding the Messages API vs. Managed Agents
Anthropic offers two ways to build with Claude:
- Messages API: Direct model prompting access. Best for custom agent loops and fine-grained control.
- Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
Basic Request and Response
At its simplest, the Messages API takes a list of messages and returns Claude's response. Here's a minimal example in Python:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
Understanding the Response
The API returns a structured JSON object:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
- content: An array of content blocks (text, tool_use, etc.)
- stop_reason: Why Claude stopped generating (
end_turn,max_tokens,stop_sequence, ortool_use) - usage: Token counts for billing and monitoring
Building Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage state on your end.
Example: Two-Turn Conversation
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
Notice that the assistant's previous response ("Hello!") is included in the messages array. This is how Claude maintains context across turns.
Important: Synthetic Assistant Messages
Earlier conversational turns don't need to actually originate from Claude. You can inject synthetic assistant messages to guide the conversation or provide context. For example:
messages = [
{"role": "user", "content": "What's the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "Tell me more about its landmarks."}
]
This is useful for:
- Providing example interactions in few-shot prompting
- Correcting or steering the conversation history
- Building multi-step reasoning chains
Putting Words in Claude's Mouth: The Prefill Technique
One of the most powerful features of the Messages API is prefilling—you can start Claude's response by including an assistant message with partial content in the last position of the input messages list.
Use Case: Constrained Multiple Choice
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
Output:
{
"content": [{"type": "text", "text": "C"}],
"stop_reason": "max_tokens"
}
By setting max_tokens=1 and prefilling with "The answer is (", we force Claude to complete only the letter. This is perfect for:
- Classification tasks
- Multiple-choice questions
- Yes/no decisions
- Structured output extraction
Use Case: Shaping Response Style
You can also prefill to control tone or format:
messages = [
{"role": "user", "content": "Explain quantum computing in one sentence."},
{"role": "assistant", "content": "Quantum computing is a revolutionary approach that "}
]
This ensures Claude continues your thought rather than starting from scratch.
Vision Capabilities: Working with Images
The Messages API supports image inputs, enabling Claude to analyze visual content. Here's how to send an image:
import base64
with open("diagram.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this diagram in detail."
}
]
}
]
)
Supported Image Formats
- JPEG, PNG, GIF, WebP
- Maximum size: 100 MB per image
- Claude processes images at various resolutions for optimal performance
Handling Stop Reasons
Understanding stop_reason is crucial for building robust applications:
| stop_reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Return response to user |
max_tokens | Output exceeded token limit | Increase max_tokens or split response |
stop_sequence | A custom stop sequence was hit | Handle as needed |
tool_use | Claude wants to call a tool | Execute tool and continue conversation |
Example: Handling max_tokens
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=100,
messages=[{"role": "user", "content": "Write a long essay on AI."}]
)
if response.stop_reason == "max_tokens":
print("Response was truncated. Consider increasing max_tokens.")
Streaming for Real-Time Responses
For a better user experience, use streaming to receive tokens as they're generated:
stream = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True
)
for chunk in stream:
if chunk.type == "content_block_delta":
print(chunk.delta.text, end="", flush=True)
Streaming is essential for:
- Chat interfaces with real-time token display
- Long responses where users expect immediate feedback
- Reducing perceived latency
Best Practices
- Manage context windows carefully: Token limits apply per request. Use the
usagefield to monitor consumption. - Use synthetic messages for few-shot prompting: Inject example assistant responses to demonstrate desired behavior.
- Prefill for structured outputs: When you need JSON, XML, or specific formats, prefill the opening tags.
- Handle errors gracefully: Always check
stop_reasonand implement retry logic for transient failures. - Optimize with prompt caching: For repeated system prompts, use prompt caching to reduce costs and latency.
Key Takeaways
- The Messages API is stateless—you must send full conversation history with each request, giving you complete control over context.
- Prefill techniques let you shape Claude's responses by starting its reply, enabling constrained outputs like multiple-choice answers or structured data.
- Vision support allows Claude to analyze images sent as base64-encoded data, opening up document analysis and visual reasoning use cases.
- Always check the
stop_reasonfield to determine why Claude stopped and handle truncation or tool calls appropriately. - Streaming provides real-time token delivery for better user experiences in chat applications.