Mastering the Messages API: A Practical Guide to Building Conversations with Claude
Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide covers the core patterns for working with Claude's Messages API: making basic requests, building multi-turn conversations, using prefill to shape responses, and leveraging vision capabilities. You'll get practical code examples in Python and TypeScript.
Introduction
Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent, understanding the Messages API is essential. This guide walks you through the most common patterns—from a simple "Hello, Claude" to multi-turn conversations, prefill techniques, and vision capabilities.
Anthropic offers two paths for building with Claude: the Messages API for direct model access and fine-grained control, and Claude Managed Agents for pre-built, configurable agent harnesses. This guide focuses on the Messages API, which is ideal for custom agent loops and applications where you need full control over the conversation flow.
Basic Request and Response
At its simplest, a Messages API call requires three things: a model name, a max_tokens limit, and an array of messages. Here's a minimal example in Python:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
The response includes the model's reply, metadata, and token usage:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to understand:
content: An array of content blocks. For text responses, it contains a single block withtype: "text".stop_reason: Indicates why the model stopped."end_turn"means the model finished naturally.usage: Token counts for billing and monitoring.
Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but means you need to manage the conversation state on your end.
Here's how to build a multi-turn conversation by appending messages:
import anthropic
client = anthropic.Anthropic()
First turn
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
Extract Claude's response
claude_reply = message.content[0].text
Second turn: include the full history
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": claude_reply},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
Important: The assistant messages don't have to come from Claude. You can inject synthetic assistant messages to guide the conversation or simulate a specific persona. This is useful for:
- Creating few-shot examples
- Setting up role-playing scenarios
- Providing context that Claude didn't generate
Putting Words in Claude's Mouth (Prefill)
Prefilling lets you start Claude's response by including part of the assistant's message in the input. This is incredibly powerful for shaping the output format or constraining responses.
Use Case: Multiple Choice Answers
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Output: "C"
By setting max_tokens=1 and prefilling with "The answer is (", we force Claude to complete with a single character—perfect for structured outputs.
Prefill Limitations
Important: Prefilling is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.
For newer models, use the system parameter or structured outputs to achieve similar results.
Vision Capabilities
Claude can process images alongside text. To send an image, include it as a content block with type: "image" and provide the image data as a base64-encoded string or via a URL.
Example: Analyzing an Image
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode the image
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported media types include image/jpeg, image/png, image/gif, and image/webp. The image size limit depends on your plan, but generally images under 20MB work well.
Handling Stop Reasons
Understanding why Claude stopped generating is crucial for building robust applications. The stop_reason field can be:
| Stop Reason | Meaning |
|---|---|
"end_turn" | Claude finished its response naturally |
"max_tokens" | The response was cut off because it hit the max_tokens limit |
"stop_sequence" | Claude encountered a custom stop sequence you defined |
"tool_use" | Claude wants to call a tool (used with tool use) |
"max_tokens", you may want to continue the conversation by sending the partial response back as context and asking Claude to continue.
Streaming Responses
For real-time applications, streaming is essential. Instead of waiting for the full response, you receive chunks as they're generated:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{"role": "user", "content": "Write a short poem about AI."}
]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Streaming is especially useful for chatbots, code completion, and any application where latency matters.
Best Practices
- Manage context windows carefully: The Messages API is stateless, so you're responsible for keeping the conversation within the model's context window. Use techniques like summarization or sliding windows for long conversations.
- Use system prompts for instruction: For setting behavior, tone, or constraints, use the
systemparameter rather than injecting instructions into the user message.
- Monitor token usage: The
usagefield in responses helps you track costs. For high-volume applications, consider prompt caching to reduce costs.
- Handle errors gracefully: Network issues, rate limits, and invalid requests can occur. Implement retry logic with exponential backoff.
- Test with different models: Claude Opus 4.7 is powerful but slower and more expensive. Claude Sonnet 4.5 offers a good balance for most applications.
Conclusion
The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision, you can create sophisticated applications that leverage Claude's full capabilities. Remember that the API is stateless—you control the context, so design your conversation management carefully.
Key Takeaways
- The Messages API is stateless: you must send the full conversation history with every request, giving you complete control over context.
- Prefill allows you to shape Claude's responses by starting its reply, but check model compatibility as newer models may not support it.
- Vision capabilities let you send images alongside text for analysis, supporting JPEG, PNG, GIF, and WebP formats.
- Streaming responses improve user experience for real-time applications by delivering content incrementally.
- Always monitor
stop_reasonandusagefields to build robust, cost-effective applications.