Mastering the Messages API: Build Conversational AI with Claude
Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities. Practical guide with code examples.
This guide covers everything you need to build with Claude's Messages API: making basic requests, managing multi-turn conversations, using prefill to shape responses, and working with images. You'll get practical code examples in Python and TypeScript.
Introduction
The Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding how to structure your API calls is essential. This guide walks you through the core patterns—from simple requests to advanced techniques like prefill and vision—so you can build robust conversational applications.
Understanding the Messages API vs. Managed Agents
Anthropic offers two paths for building with Claude:
- Messages API: Direct model access. You control the entire conversation loop, manage state, and handle tool calls yourself. Best for custom agent loops and fine-grained control.
- Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
Making Your First API Request
Let's start with the simplest possible request: sending a single message and getting a response.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message);
Understanding the Response
The API returns a structured JSON object:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
content: An array of content blocks (text, tool_use, etc.)stop_reason: Why the response ended (end_turn,max_tokens,stop_sequence,tool_use)usage: Token counts for billing and context management
Building Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context.
Python Example
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
Important Notes
- You don't need to use only real assistant responses. You can synthesize assistant messages to guide the conversation.
- Always alternate between
userandassistantroles. Two consecutive user messages will cause an error. - The conversation history counts toward your input token limit, so be mindful of context window constraints.
Prefill: Putting Words in Claude's Mouth
Prefill allows you to start Claude's response for it. This is useful for:
- Forcing structured outputs (e.g., JSON, multiple choice)
- Steering the tone or format of the response
- Reducing token usage by constraining the output
Example: Multiple Choice Answer
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Output: "C"
Prefill Limitations
Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.
For models that don't support prefill, consider:
- Using the
systemparameter with formatting instructions - Implementing structured outputs (JSON mode)
- Post-processing the response
Working with Images (Vision)
The Messages API supports image inputs for multimodal understanding. You can pass images as base64-encoded data or as URLs.
Python Example
import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported Image Types
- JPEG, PNG, GIF, WebP
- Maximum size: 100 MB per image
- Optimal resolution: 1568x1568 pixels (larger images are downscaled)
Handling Stop Reasons
Understanding why Claude stopped generating helps you build robust applications:
| Stop Reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue or end conversation |
max_tokens | Hit the token limit | Increase max_tokens or continue |
stop_sequence | Found a custom stop sequence | Handle as needed |
tool_use | Claude wants to call a tool | Execute tool and return result |
Example: Handling Tool Calls
if message.stop_reason == "tool_use":
for block in message.content:
if block.type == "tool_use":
tool_name = block.name
tool_input = block.input
# Execute your tool logic here
print(f"Claude wants to call {tool_name} with {tool_input}")
Best Practices
1. Manage Context Window
- Keep conversation history concise. Summarize or prune old messages when approaching token limits.
- Use prompt caching for repeated system instructions (see Prompt Caching docs).
2. Handle Errors Gracefully
- Always catch API errors (rate limits, authentication, invalid requests).
- Implement exponential backoff for retries.
3. Use System Messages
For persistent instructions, use the system parameter instead of repeating instructions in user messages:
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system="You are a helpful assistant that always responds in JSON format.",
messages=[
{"role": "user", "content": "List three planets."}
]
)
4. Streaming for Responsiveness
For real-time applications, use streaming to show partial responses as they're generated:
with client.messages.stream(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Conclusion
The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision capabilities, you can create sophisticated AI applications. Remember that the API is stateless—you control the context—and always handle stop reasons appropriately for robust applications.
Key Takeaways
- Stateless design: You must send the full conversation history with every request, giving you complete control over context.
- Prefill shapes responses: Start Claude's response to enforce structure, but check model compatibility.
- Vision is built-in: Pass images as base64 or URLs for multimodal understanding.
- Handle stop reasons:
end_turn,max_tokens, andtool_useeach require different handling logic. - Stream for UX: Use streaming to improve perceived responsiveness in user-facing applications.