Mastering the Messages API: A Practical Guide to Building with Claude
Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide teaches you how to use Claude's Messages API to send basic requests, build multi-turn conversations, prefill Claude's responses, and work with images. You'll get practical Python and TypeScript examples for each pattern.
Mastering the Messages API: A Practical Guide to Building with Claude
Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent, understanding the Messages API is essential. This guide walks you through the most common patterns—from basic requests to advanced techniques like prefill and vision—with practical code examples you can use today.
What Is the Messages API?
The Messages API gives you direct access to Claude's language model. You send a list of messages (your conversation history) and receive Claude's response. It's stateless, meaning you manage the conversation context yourself by sending the full history with each request.
Anthropic offers two ways to build with Claude:
- Messages API: Direct model access, best for custom agent loops and fine-grained control.
- Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure, ideal for long-running tasks.
Basic Request and Response
Let's start with the simplest possible interaction: sending a single message and getting a reply.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message);
Understanding the Response
The API returns a structured response object. Here's what you get:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields:
content: An array of content blocks (usually text, but can include tool use blocks).stop_reason: Why Claude stopped generating. Common values are"end_turn"(Claude finished naturally) and"max_tokens"(hit the token limit).usage: Token counts for billing and monitoring.
Building Multi-Turn Conversations
Because the Messages API is stateless, you must send the entire conversation history with each request. This gives you full control over context.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
Important Notes
- You control the history: Earlier turns don't need to come from Claude. You can inject synthetic assistant messages (e.g., from a database or previous session).
- Order matters: Messages must alternate between
userandassistantroles, starting withuser. - Context window: Be mindful of the total token count. Claude's context window varies by model (typically 200K tokens).
Practical Tip: Managing Conversation State
In a real application, you'll store messages in a list and append new ones as the conversation progresses:
conversation = [
{"role": "user", "content": "Hello, Claude"}
]
First turn
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=conversation
)
Add Claude's response to history
conversation.append({"role": "assistant", "content": response.content[0].text})
Add user's next message
conversation.append({"role": "user", "content": "Tell me more about yourself."})
Second turn
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=conversation
)
Prefilling Claude's Response
Prefilling lets you start Claude's response for it. You place an assistant message with partial content at the end of your messages array, and Claude continues from there.
Use Case: Multiple Choice Questions
This pattern is great for getting structured, constrained outputs:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Outputs: "C"
Important Limitations
- Not supported on all models: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests with these models return a 400 error.
- Alternatives: For unsupported models, use structured outputs or system prompt instructions instead.
- Token limit: Set
max_tokensappropriately. In the example above,max_tokens=1ensures Claude only outputs the letter.
Other Prefill Patterns
- JSON completion: Prefill with
{"response":to get structured JSON. - Sentence completion: Prefill with "In summary," to guide Claude toward a conclusion.
- Role playing: Prefill with "As a helpful assistant, I would say:" to reinforce persona.
Working with Images (Vision)
Claude can analyze images. You include image content blocks in your user messages.
Python Example
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode the image
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported Image Formats
- JPEG
- PNG
- GIF (first frame only)
- WebP
Tips for Vision Requests
- Use base64 encoding: The API accepts base64-encoded image data.
- Combine with text: Always include a text prompt alongside your image to tell Claude what to do.
- Resolution matters: Higher resolution images use more tokens. For simple tasks, consider resizing images.
- Token cost: Images are tokenized based on size and resolution. Check the usage field in the response to monitor costs.
Handling Stop Reasons
Understanding why Claude stopped generating helps you build robust applications.
| stop_reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue conversation or end |
max_tokens | Hit the token limit | Increase max_tokens or split response |
stop_sequence | Found a stop sequence | Handle based on your logic |
tool_use | Claude wants to use a tool | Execute the tool and return results |
Example: Handling max_tokens
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=100,
messages=[{"role": "user", "content": "Write a long essay on AI."}]
)
if response.stop_reason == "max_tokens":
print("Response was truncated. Consider increasing max_tokens.")
Best Practices
- Manage context window: Keep conversation history within the model's context limit. Use techniques like summarization for long conversations.
- Use system prompts: For persistent instructions, use the
systemparameter instead of repeating instructions in every user message. - Monitor token usage: Track
usage.input_tokensandusage.output_tokensto control costs. - Handle errors gracefully: Implement retry logic for transient errors and check for 400 errors on invalid requests.
- Stream responses: For real-time applications, use streaming to get tokens as they're generated.
Key Takeaways
- The Messages API is stateless—you must send the full conversation history with each request.
- Multi-turn conversations are built by maintaining a list of alternating user and assistant messages.
- Prefilling lets you guide Claude's response by providing a partial assistant message, but check model compatibility.
- Vision capabilities allow Claude to analyze images by including base64-encoded image content blocks.
- Always check
stop_reasonandusagein the response to handle truncation and monitor costs effectively.