Mastering the Messages API: Building Conversational Apps with Claude
Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide covers the Messages API patterns: basic requests, multi-turn conversations by sending full history, prefilling Claude's responses to shape output, and using vision. You'll get Python/TypeScript examples for each pattern.
Mastering the Messages API: Building Conversational Apps with Claude
Claude’s Messages API is the primary way to interact with the model programmatically. Whether you’re building a simple chatbot, a multi-turn assistant, or a vision-powered app, understanding how to structure your API calls is essential. This guide walks you through the core patterns—basic requests, multi-turn conversations, prefill techniques, and vision capabilities—with practical code examples in Python and TypeScript.
Understanding the Messages API vs. Managed Agents
Anthropic offers two paths for building with Claude:
- Messages API: Direct model access. You control every aspect of the conversation loop. Best for custom agents, fine-grained control, and real-time interactions.
- Claude Managed Agents: A pre-built, configurable agent harness that runs on managed infrastructure. Best for long-running tasks and asynchronous work.
Basic Request and Response
At its simplest, you send a list of messages and receive a response. Each message has a role (user or assistant) and content.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message);
Response Structure
The API returns a structured response:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{ "type": "text", "text": "Hello!" }
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields:
- content: An array of content blocks (text, tool_use, etc.).
- stop_reason: Why the model stopped (
end_turn,max_tokens,stop_sequence, ortool_use). - usage: Token counts for billing and monitoring.
Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage the history yourself.
Building a Conversation Over Time
import anthropic
client = anthropic.Anthropic()
First turn
response1 = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
Second turn: include previous exchange
response2 = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": response1.content[0].text},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(response2.content[0].text)
Synthetic Assistant Messages
You can inject synthetic assistant messages—they don’t have to come from Claude. This is useful for:
- Providing example responses (few-shot prompting)
- Guiding conversation flow
- Simulating a persona
messages = [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What about Italy?"}
]
Putting Words in Claude’s Mouth (Prefill)
Prefilling lets you start Claude’s response by including an assistant message with partial content at the end of your messages array. This is powerful for:
- Constraining output format (e.g., JSON, multiple choice)
- Guiding tone or style
- Reducing token usage on predictable responses
Example: Multiple Choice Answer
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Outputs: C
By setting max_tokens=1 and prefilling "The answer is (", Claude only needs to output a single character. This is efficient and deterministic.
Prefill Best Practices
- Match the tone: If you prefill with formal language, Claude continues formally.
- Don’t contradict: Prefilling something the model wouldn’t naturally say can cause confusion.
- Use for structure: Prefill JSON keys or XML tags to enforce output format.
Vision Capabilities
The Messages API supports image inputs. You can send images as base64-encoded data or via URLs (if hosted publicly).
Sending an Image (Python)
import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported Media Types
image/jpegimage/pngimage/gifimage/webp
Vision Tips
- Image size matters: Larger images consume more tokens. Resize to 1024x1024 or less if possible.
- Combine with text: Always include a text prompt to guide Claude’s analysis.
- Multiple images: You can include multiple images in a single message.
Handling Stop Reasons
Understanding stop_reason helps you build robust applications:
| stop_reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue or end conversation |
max_tokens | Output was truncated | Increase max_tokens or split response |
stop_sequence | A custom stop sequence was hit | Handle as needed |
tool_use | Claude wants to call a tool | Execute tool and return result |
Example: Handling max_tokens
if message.stop_reason == "max_tokens":
# Continue the conversation to get more output
messages.append({"role": "assistant", "content": message.content[0].text})
messages.append({"role": "user", "content": "Please continue."})
# Call API again
Streaming for Real-Time Responses
For a better user experience, use streaming. The API sends chunks as they’re generated.
Python Streaming
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Tell me a story"}
]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Key Takeaways
- The Messages API is stateless—you must send the full conversation history with every request. Manage context on your side.
- Prefilling lets you shape Claude’s responses by providing a partial assistant message. Use it for format control and efficiency.
- Vision is supported via base64-encoded images or URLs. Combine images with text prompts for best results.
- Streaming improves user experience by delivering tokens in real time. Use it for chat applications.
- Handle stop reasons appropriately—especially
max_tokensandtool_use—to build robust, production-ready applications.