Mastering the Messages API: A Practical Guide to Building with Claude
Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide teaches you how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities, with practical Python and TypeScript code examples.
Mastering the Messages API: A Practical Guide to Building with Claude
Claude's Messages API is the primary way to interact with Anthropic's language models programmatically. Whether you're building a chatbot, an agent, or an automation tool, understanding how to structure requests and handle responses is essential. This guide covers everything from basic API calls to advanced patterns like multi-turn conversations, prefill techniques, and vision capabilities.
Understanding the Messages API vs. Managed Agents
Before diving into code, it's important to understand the two main ways to build with Claude:
- Messages API: Direct model prompting access. Best for custom agent loops and fine-grained control. You manage the conversation state and logic yourself.
- Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
Making Your First API Request
Let's start with a simple request. The Messages API expects a model, max_tokens, and an array of messages with alternating user and assistant roles.
Python Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
TypeScript Example
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message);
Understanding the Response
The API returns a structured response:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{ "type": "text", "text": "Hello!" }
],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
content: An array of content blocks (usually text, but can include tool calls or images).stop_reason: Why the response ended (end_turn,max_tokens,stop_sequence, ortool_use).usage: Token counts for billing and context management.
Building Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage state on your end.
Example: Two-Turn Conversation
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
Important: The conversation history doesn't need to be real. You can inject synthetic assistant messages to guide Claude's behavior or provide context from external systems.
Managing Conversation State
In production, you'll want to store conversation history in a database or cache:
conversation = [
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"}
]
Later...
conversation.append({"role": "user", "content": "What's the weather?"})
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=conversation
)
conversation.append({"role": "assistant", "content": response.content[0].text})
Prefilling Claude's Response
One powerful technique is prefilling—putting words in Claude's mouth by including an assistant message at the end of your input. This shapes the response and can enforce specific formats.
Example: Multiple Choice Answer
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Output: "C"
By setting max_tokens=1, Claude only generates the next token—the letter "C". This is perfect for classification tasks or structured outputs.
Prefill Limitations
Prefill is not supported on the following models:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
Handling Stop Reasons
Understanding why Claude stopped generating helps you build robust applications:
| Stop Reason | Meaning |
|---|---|
end_turn | Claude finished naturally |
max_tokens | Response was cut off due to token limit |
stop_sequence | A custom stop sequence was encountered |
tool_use | Claude wants to call a tool |
Example: Handling max_tokens
If Claude stops due to max_tokens, you can continue the conversation by sending the partial response back:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=100,
messages=[
{"role": "user", "content": "Write a long story"}
]
)
if response.stop_reason == "max_tokens":
# Continue from where Claude left off
continuation = client.messages.create(
model="claude-opus-4-7",
max_tokens=100,
messages=[
{"role": "user", "content": "Write a long story"},
{"role": "assistant", "content": response.content[0].text},
{"role": "user", "content": "Please continue"}
]
)
Working with Images (Vision)
The Messages API supports image inputs. You can send images as base64-encoded data or URLs.
Sending an Image
import base64
with open("photo.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_data
}
},
{
"type": "text",
"text": "What's in this image?"
}
]
}
]
)
print(message.content[0].text)
Supported media types: image/jpeg, image/png, image/gif, image/webp.
Best Practices
- Manage context windows: Keep conversation history within Claude's context window. Use techniques like summarization or sliding windows for long conversations.
- Use system prompts: For persistent instructions, use the
systemparameter instead of repeating instructions in every user message. - Handle errors gracefully: Implement retry logic for rate limits and network errors.
- Monitor token usage: Track
usage.input_tokensandusage.output_tokensto optimize costs. - Stream responses: For real-time applications, use streaming to get partial responses as they're generated.
Conclusion
The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create powerful applications that leverage Claude's intelligence. Remember that the API is stateless—you control the conversation flow, which gives you maximum flexibility but also requires careful state management.
Key Takeaways
- The Messages API is stateless—you must send the full conversation history with every request.
- Prefill allows you to shape Claude's responses by providing partial assistant messages, but it's not supported on all models.
- Use
stop_reasonto handle different response endings, especiallymax_tokensfor truncated responses. - The API supports multimodal inputs, including images (base64 or URL) alongside text.
- Always monitor token usage and manage context windows to optimize performance and costs.