Mastering the Messages API: Build Multi-Turn Conversations with Claude
Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide teaches you how to use Claude's Messages API to build conversational applications, including sending basic requests, managing multi-turn dialogues, pre-filling responses, and processing images.
Introduction
The Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a document analysis tool, or a creative writing assistant, understanding how to structure your API calls is essential. This guide covers the core patterns you'll use daily: basic requests, multi-turn conversations, prefill techniques, and vision capabilities.
Basic Request and Response
At its simplest, a Messages API call requires three things:
- model: The Claude model you want to use (e.g.,
claude-opus-4-7) - max_tokens: The maximum number of tokens in Claude's response
- messages: An array of message objects, each with a
roleandcontent
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
The response includes:
- id: Unique message identifier
- role: Always
"assistant" - content: Array of content blocks (typically text)
- model: The model used
- stop_reason: Why generation stopped (
"end_turn","max_tokens", etc.) - usage: Token counts for input and output
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {"input_tokens": 12, "output_tokens": 6}
}
Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over context but requires careful management.
Building a Conversation
To continue a conversation, append both Claude's previous response and the user's new message to the messages array:
import anthropic
client = anthropic.Anthropic()
First turn
messages = [
{"role": "user", "content": "What is the capital of France?"}
]
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=messages
)
Add Claude's response to history
messages.append({"role": "assistant", "content": response.content[0].text})
Add user's follow-up
messages.append({"role": "user", "content": "What about Italy?"})
Second turn
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=messages
)
print(response.content[0].text)
Synthetic Assistant Messages
You can inject pre-written assistant messages into the history. This is useful for:
- Setting up a scenario or persona
- Providing example responses (few-shot prompting)
- Correcting or editing Claude's past responses
messages = [
{"role": "user", "content": "Explain quantum computing in simple terms."},
{"role": "assistant", "content": "Quantum computing uses qubits that can be 0 and 1 simultaneously, unlike classical bits."},
{"role": "user", "content": "Give me an analogy."}
]
Prefill: Putting Words in Claude's Mouth
Prefill allows you to start Claude's response by providing the beginning of its answer. This is powerful for:
- Enforcing a specific format (e.g., JSON, multiple choice)
- Guiding the tone or direction
- Reducing token usage by constraining output
Example: Multiple Choice
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{"role": "user", "content": "What is the best programming language for beginners?\nA) Python\nB) Java\nC) C++\nD) Rust"},
{"role": "assistant", "content": "A"}
]
)
print(message.content[0].text) # Outputs: A
Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. For these models, use structured outputs or system prompt instructions instead.
Vision: Working with Images
Claude can process images alongside text. You can supply images in three ways:
- base64: Inline base64-encoded image data
- url: Publicly accessible image URL
- file: Reference to a file uploaded via the Files API
image/jpeg, image/png, image/gif, image/webp
Example with Base64
import anthropic
import base64
client = anthropic.Anthropic()
with open("photo.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_data
}
}
]
}
]
)
print(message.content[0].text)
Example with URL
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/photo.jpg"
}
}
]
}
]
)
Best Practices
- Manage token usage: Monitor
usage.input_tokensandusage.output_tokensto control costs. Usemax_tokensto limit response length. - Handle stop reasons: Check
stop_reasonin responses."end_turn"means Claude finished naturally;"max_tokens"means the response was cut off. - Use streaming for long responses: For real-time applications, enable streaming to get partial results as Claude generates them.
- Cache frequent prefixes: Use prompt caching for system prompts or long conversation histories to reduce latency and cost.
- Validate image sizes: Large images consume more tokens. Resize or compress images before sending to optimize performance.
Key Takeaways
- The Messages API is stateless—always send the full conversation history with each request.
- Prefill lets you control the beginning of Claude's response, useful for formatting and guidance.
- Claude supports vision with images in base64, URL, or file reference formats.
- Synthetic assistant messages allow you to inject example responses or correct past interactions.
- Always check
stop_reasonandusagefields to monitor response completeness and costs.