Mastering the Messages API: Build Conversational AI with Claude
Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities. Practical guide with code examples.
This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
Mastering the Messages API: Build Conversational AI with Claude
Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generator, or a complex agentic system, understanding how to craft and manage messages is essential. This guide walks you through everything from basic requests to advanced techniques like prefill and vision.
Understanding the Messages API
Anthropic offers two main ways to build with Claude:
- Messages API: Direct model prompting access, giving you fine-grained control over the conversation flow. Best for custom agent loops and real-time interactions.
- Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
Making Your First API Call
Let's start with a simple request. The following example sends a single user message and prints Claude's response.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message);
Understanding the Response
The API returns a structured JSON object:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields:
- id: Unique identifier for the message
- content: Array of content blocks (usually text)
- stop_reason: Why the model stopped (
end_turn,max_tokens,stop_sequence, ortool_use) - usage: Token counts for input and output
Building Multi-Turn Conversations
Since the Messages API is stateless, you must send the entire conversation history with each request. This allows you to build up context over multiple turns.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
Important Notes
- Synthetic assistant messages: You can include messages that didn't actually come from Claude. This is useful for providing context or simulating previous interactions.
- Conversation history: Always include the full history to maintain context. The order must be alternating user/assistant messages, starting with user.
- Token costs: Each request includes the entire history, so longer conversations cost more in input tokens.
Prefill: Putting Words in Claude's Mouth
Prefill allows you to start Claude's response for it. This is powerful for:
- Constraining output format: Force Claude to start with a specific structure
- Multiple choice questions: Get a single letter or number as the answer
- Guiding tone or style: Start Claude's response with the desired tone
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Output: "C"
Prefill Limitations
Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.
For models that don't support prefill, consider:
- Structured outputs: Define a JSON schema for the response
- System prompt instructions: Tell Claude exactly how to format its response
Vision Capabilities
Claude can process images through the Messages API. This enables use cases like:
- Image analysis and description
- Document processing (PDFs, screenshots)
- Visual question answering
Python Example
import anthropic
import base64
client = anthropic.Anthropic()
Load and encode image
with open("screenshot.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "What's in this image?"
}
]
}
]
)
print(message.content[0].text)
Supported Image Formats
- JPEG
- PNG
- GIF
- WebP
Handling Stop Reasons
Understanding why Claude stopped generating is crucial for building robust applications:
| Stop Reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue conversation or end |
max_tokens | Output hit the token limit | Increase max_tokens or truncate |
stop_sequence | Claude encountered a stop sequence | Handle based on your logic |
tool_use | Claude wants to use a tool | Execute the tool and return results |
Python Example: Handling Stop Reasons
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[...]
)
if message.stop_reason == "max_tokens":
print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "end_turn":
print("Claude finished naturally.")
elif message.stop_reason == "tool_use":
print("Claude requested a tool call.")
Best Practices
1. Manage Token Usage
- Prompt caching: For repeated system prompts or large context, use prompt caching to reduce costs and latency.
- Token counting: Use the token counting endpoint to estimate costs before sending requests.
- Compaction: For very long conversations, consider summarizing earlier turns to save tokens.
2. Handle Errors Gracefully
- Implement retry logic with exponential backoff for rate limits.
- Validate inputs before sending to avoid 400 errors.
- Monitor for
stop_reasonto detect truncation or tool requests.
3. Use Streaming for Real-Time Applications
For chat interfaces, use streaming to show Claude's response as it's generated:
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Conclusion
The Messages API is your gateway to building powerful conversational AI applications with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision capabilities, you can create everything from simple chatbots to complex agentic systems.
Remember these key points:
- The API is stateless — always send the full conversation history
- Use prefill carefully and check model compatibility
- Handle stop reasons to build robust applications
- Leverage streaming for better user experiences
Key Takeaways
- Messages API is stateless: You must send the full conversation history with each request to maintain context.
- Prefill is powerful but limited: It works on most models but not on Claude Opus 4.7, Opus 4.6, Sonnet 4.6, or Mythos Preview. Use structured outputs as an alternative.
- Handle stop reasons: Always check
stop_reasonto detect truncation (max_tokens) or tool requests (tool_use). - Vision is built-in: You can send images as base64 or URL for analysis, enabling document processing and visual QA.
- Stream for real-time apps: Use the streaming API for chat interfaces to show responses as they're generated.