Mastering the Messages API: Building Conversational AI with Claude
Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, response prefilling, and vision tasks. Practical guide with code examples.
This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, response prefilling, and vision capabilities with practical code examples.
Mastering the Messages API: Building Conversational AI with Claude
Claude's Messages API is the core interface for integrating Claude into your applications. Whether you're building a chatbot, a content generator, or a vision-powered assistant, understanding how to work with messages is essential. This guide walks you through everything from basic requests to advanced techniques like prefilling and vision.
Understanding the Messages API
The Messages API is a stateless, RESTful API that lets you send a sequence of messages to Claude and receive a generated response. Unlike some other AI APIs, you always send the full conversation history with each request—Claude doesn't remember previous interactions unless you provide them.
Anthropic offers two primary ways to build with Claude:
- Messages API: Direct model access for custom agent loops and fine-grained control
- Claude Managed Agents: Pre-built, configurable agent harness for long-running tasks
Making Your First API Call
Let's start with a simple request. You'll need an Anthropic API key and the SDK for your preferred language.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message.content[0].text)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
async function main() {
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message.content[0].text);
}
main();
Understanding the Response
The API returns a structured response object:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{ "type": "text", "text": "Hello!" }
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields:
id: Unique identifier for the messagecontent: Array of content blocks (text, images, tool use, etc.)stop_reason: Why Claude stopped generating (end_turn,max_tokens,stop_sequence, ortool_use)usage: Token counts for billing and monitoring
Building Multi-Turn Conversations
Because the Messages API is stateless, you must maintain conversation history yourself. Each request includes the full history of messages.
Python Example
import anthropic
client = anthropic.Anthropic()
First turn
message1 = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
Second turn - include history
message2 = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": message1.content[0].text},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message2.content[0].text)
Important Notes
- Always send the complete history: Claude has no memory between requests
- Synthetic assistant messages: You can insert pre-written assistant responses for context (e.g., for few-shot examples)
- Token limits: Longer histories consume more input tokens, so be mindful of context windows
Prefilling Claude's Response
Prefilling lets you "put words in Claude's mouth" by providing the beginning of the assistant's response. This is powerful for:
- Enforcing response formats
- Guiding Claude's reasoning
- Creating structured outputs (though structured outputs are preferred for newer models)
Example: Multiple Choice Answer
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Outputs: "C"
Prefill Limitations
Prefilling is not supported on:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
Working with Vision
The Messages API supports image inputs, enabling Claude to analyze and describe visual content.
Python Example
import anthropic
import base64
client = anthropic.Anthropic()
Load and encode image
with open("diagram.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this diagram in detail."
}
]
}
]
)
print(message.content[0].text)
Supported Image Formats
- JPEG, PNG, GIF, WebP
- Maximum size: 100MB per image
- Claude processes images at varying resolutions; larger images use more tokens
Handling Stop Reasons
Understanding why Claude stopped generating helps you build robust applications:
| Stop Reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue or end conversation |
max_tokens | Hit token limit | Increase max_tokens or truncate |
stop_sequence | Found a stop sequence | Continue or process result |
tool_use | Claude wants to use a tool | Execute tool and return result |
Example: Handling Tool Use
if message.stop_reason == "tool_use":
for block in message.content:
if block.type == "tool_use":
# Execute the tool and send result back
tool_result = execute_tool(block.name, block.input)
# Add to conversation and continue
Best Practices
- Manage context windows: Keep conversation histories within Claude's context limit. Use techniques like summarization or sliding windows for long conversations.
- Use system prompts: For consistent behavior, define Claude's persona and constraints in the system parameter (available in newer models).
- Monitor token usage: Track
usage.input_tokensandusage.output_tokensto control costs and stay within limits.
- Handle errors gracefully: Implement retry logic for rate limits and network issues. Check for 400 errors on invalid requests.
- Stream responses: For better user experience, use streaming to show Claude's response as it's generated.
Next Steps
Now that you understand the Messages API basics, explore:
- Streaming Messages for real-time responses
- Tool Use to give Claude abilities
- Prompt Caching to reduce costs
Key Takeaways
- The Messages API is stateless—always send the full conversation history with each request
- Prefilling lets you guide Claude's responses but is not supported on all models (use structured outputs for newer models)
- Vision support enables image analysis by sending base64-encoded images in the content array
- Handle stop reasons (
end_turn,max_tokens,tool_use) to build robust conversational flows - Monitor token usage and manage context windows to control costs and maintain quality