Mastering the Messages API: A Practical Guide to Building with Claude
Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide covers the core patterns of Claude's Messages API, including sending basic requests, building multi-turn conversations, using prefill to shape responses, and handling images with vision. You'll get practical code examples in Python and TypeScript.
Introduction
Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generator, or a complex agent, understanding the Messages API is essential. This guide walks you through the most common patterns—from a simple request to multi-turn conversations, prefill techniques, and vision capabilities.
Anthropic offers two paths for building with Claude: the Messages API for direct model access and fine-grained control, and Claude Managed Agents for pre-built, configurable agent harnesses. This guide focuses on the Messages API, which is ideal for custom agent loops and applications that need precise control over the conversation flow.
Basic Request and Response
At its simplest, the Messages API accepts a list of messages (each with a role and content) and returns a response from Claude. Here's a minimal example in Python and TypeScript.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
async function main() {
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message);
}
main();
Understanding the Response
The API returns a structured JSON object. Here's a typical response:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
content: An array of content blocks. Currently, the main type istext, but you'll also seetool_usewhen using tools.stop_reason: Indicates why the response ended. Common values are"end_turn"(Claude finished naturally),"max_tokens"(hit the token limit), or"tool_use"(Claude wants to call a tool).usage: Token counts for billing and monitoring.
Building Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over the context but requires you to manage the message list on your end.
Example: Two-Turn Conversation
import anthropic
client = anthropic.Anthropic()
First turn
messages = [
{"role": "user", "content": "What is the capital of France?"}
]
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=messages
)
Append assistant response to history
messages.append({"role": "assistant", "content": response.content[0].text})
Second turn
messages.append({"role": "user", "content": "What is its population?"})
response2 = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=messages
)
print(response2.content[0].text)
Synthetic Assistant Messages
You can also inject synthetic assistant messages—messages that weren't actually generated by Claude. This is useful for:
- Providing context: "You previously said X."
- Guiding the conversation: "Now, let's move on to topic Y."
- Simulating a multi-step process: "Step 1 is complete. Here's the result."
messages = [
{"role": "user", "content": "What's the weather in Tokyo?"},
{"role": "assistant", "content": "I don't have real-time data, but I can help you find a weather API."},
{"role": "user", "content": "Okay, show me how to call it."}
]
Note: Earlier turns don't need to originate from Claude. You can use synthetic messages to simulate a conversation history or provide structured context.
Putting Words in Claude's Mouth (Prefill)
Prefill allows you to start Claude's response by providing the beginning of its answer. This is powerful for:
- Constraining output format: Force Claude to start with a specific word or phrase.
- Multiple choice questions: Get a single letter answer by setting
max_tokens: 1. - Structured outputs: Begin a JSON object or XML block.
Example: Multiple Choice
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{"role": "user", "content": "What is the capital of France? A) London B) Paris C) Berlin D) Madrid"},
{"role": "assistant", "content": "B"}
]
)
print(message.content[0].text) # Output: B
Important Limitations
- Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests with prefill on these models return a 400 error.
- Alternative: Use structured outputs or system prompt instructions for these models. See the migration guide for patterns.
Prefill for JSON Output
messages = [
{"role": "user", "content": "Extract the name and age from: 'John is 30 years old.'"},
{"role": "assistant", "content": "{\"name\": \""}
]
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=50,
messages=messages
)
print(response.content[0].text) # Output: John", "age": 30}
Vision: Working with Images
Claude can process images in requests using the Messages API. You can supply images via:
base64: Base64-encoded image data.url: A publicly accessible URL.file: An image uploaded through the Files API.
image/jpeg, image/png, image/gif, image/webp.
Example: Image from URL
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/ant.jpg"
}
},
{
"type": "text",
"text": "What is in this image?"
}
]
}
]
)
print(message.content[0].text)
Example: Image from Base64
import base64
import anthropic
client = anthropic.Anthropic()
with open("ant.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this image in detail."
}
]
}
]
)
print(message.content[0].text)
Vision Response
Claude returns a text description of the image. For example:
{
"content": [
{
"type": "text",
"text": "This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible."
}
]
}
Best Practices
- Manage context windows carefully: Since the API is stateless, you control the context. Be mindful of token limits—trim older messages if needed.
- Use
max_tokenswisely: For constrained outputs (like multiple choice), setmax_tokenslow. For open-ended responses, set it higher. - Handle
stop_reason: Always checkstop_reasonin the response. If it's"max_tokens", the response was truncated. If it's"tool_use", you need to execute a tool and continue the conversation. - Prefill for structure: Use prefill to enforce output formats, but remember it's not supported on all models. Fall back to system prompts or structured outputs for newer models.
- Image optimization: For vision, use compressed images (JPEG/WebP) to reduce token usage. Larger images consume more tokens.
Key Takeaways
- The Messages API is stateless—you must send the full conversation history with every request. Manage the message list on your end.
- Prefill lets you shape Claude's response by providing the beginning of its answer. Use it for multiple choice, JSON output, or any constrained format—but check model compatibility.
- Vision support is built-in via
base64,url, orfilesources. Claude can analyze images alongside text in a single request. - Synthetic assistant messages allow you to inject context or guide the conversation without requiring actual Claude responses.
- Always check
stop_reasonto understand why Claude stopped and handle truncation or tool calls appropriately.