Mastering the Messages API: Build Conversational AI with Claude
Learn how to use Claude's Messages API for multi-turn conversations, response prefilling, and vision tasks. Includes Python and TypeScript code examples.
This guide teaches you how to use Claude's Messages API to build conversational AI applications. You'll learn basic requests, multi-turn conversations, prefilling responses, and vision capabilities with practical code examples.
Introduction
Claude's Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generator, or a vision-enabled assistant, the Messages API gives you direct access to Claude's powerful language and reasoning capabilities.
This guide covers the essential patterns you'll need to work with the Messages API effectively: basic requests, multi-turn conversations, prefilling responses, and vision capabilities. By the end, you'll be able to build sophisticated conversational applications with Claude.
Basic Request and Response
Let's start with the simplest interaction: sending a single message and getting a response. Here's how it looks in Python and TypeScript:
Python
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message.content[0].text)
TypeScript
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message.content[0].text);
The response includes the model's reply, along with metadata like the stop_reason and token usage. The stop_reason tells you why Claude stopped generating—commonly "end_turn" (natural completion) or "max_tokens" (hit the token limit).
Multi-Turn Conversations
The Messages API is stateless—it doesn't remember previous interactions. To maintain a conversation, you must send the full history with each request. This gives you complete control over the context.
Here's how to build a two-turn conversation:
Python
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello! How can I help you today?"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
TypeScript
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' },
{ role: 'assistant', content: 'Hello! How can I help you today?' },
{ role: 'user', content: 'Can you describe LLMs to me?' }
]
});
console.log(message.content[0].text);
Notice that you include the assistant's previous response as part of the input. This pattern allows you to build long-running conversations by appending each new turn to the message array.
Pro tip: You can also inject synthetic assistant messages—they don't have to come from Claude. This is useful for guiding the conversation or providing context.
Putting Words in Claude's Mouth (Prefilling)
Prefilling lets you start Claude's response for it. You include a partial assistant message at the end of the input, and Claude continues from there. This is powerful for:
- Constraining responses (e.g., multiple choice answers)
- Setting the tone or format
- Guiding Claude toward a specific structure
Python
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Outputs: "C"
TypeScript
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 1,
messages: [
{
role: 'user',
content: 'What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae'
},
{
role: 'assistant',
content: 'The answer is ('
}
]
});
console.log(message.content[0].text); // Outputs: "C"
By setting max_tokens=1, Claude only generates the next token—in this case, the letter "C". The prefilled text "The answer is (" sets the context so Claude completes naturally.
Important: Prefilling is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. For these models, use structured outputs or system prompt instructions instead.
Vision Capabilities
The Messages API also supports image inputs. You can send images as base64-encoded data or via URLs. Here's an example:
Python
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
TypeScript
import Anthropic from '@anthropic-ai/sdk';
import * as fs from 'fs';
const client = new Anthropic();
// Read and encode image
const imageBuffer = fs.readFileSync('chart.png');
const base64Image = imageBuffer.toString('base64');
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/png',
data: base64Image
}
},
{
type: 'text',
text: 'Describe this chart in detail.'
}
]
}
]
});
console.log(message.content[0].text);
You can also use image URLs:
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/chart.png"
}
}
Supported media types include image/jpeg, image/png, image/gif, and image/webp.
Handling Stop Reasons
Every response includes a stop_reason field. Understanding these helps you handle different scenarios:
end_turn: Claude finished naturally. The response is complete.max_tokens: Claude hit the token limit. The response may be truncated. You can continue by sending the partial response back with a follow-up request.stop_sequence: Claude encountered a custom stop sequence you defined.tool_use: Claude wants to call a tool (if you've enabled tools).
stop_reason: "max_tokens", you can append the partial response and ask Claude to continue:
# After getting a truncated response
messages.append({"role": "assistant", "content": partial_response})
messages.append({"role": "user", "content": "Please continue."})
Best Practices
- Manage context length: Since you send the full history, be mindful of token limits. For long conversations, consider summarizing earlier turns or using prompt caching.
- Use system prompts for instructions: For general behavior guidelines, use the
systemparameter instead of repeating instructions in every user message.
- Handle errors gracefully: The API may return errors for invalid requests (e.g., unsupported model for prefilling). Always check the response status.
- Stream for real-time applications: Use streaming to get tokens as they're generated, improving perceived responsiveness.
Key Takeaways
- The Messages API is stateless—you must send the full conversation history with each request to maintain context.
- Prefilling lets you start Claude's response, which is useful for constraining outputs or guiding format.
- Vision capabilities allow you to send images as base64 or URLs for analysis.
- Monitor
stop_reasonto handle truncated responses or tool calls appropriately. - Always check model compatibility for advanced features like prefilling.