Mastering the Messages API: Building Conversational AI with Claude
Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities. Practical guide with code examples.
This guide teaches you how to use the Claude Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
Introduction
The Claude Messages API is the primary interface for building conversational AI applications with Anthropic's Claude models. Whether you're creating a chatbot, a content generation tool, or a complex agent system, understanding how to work with messages is essential.
This guide covers the core patterns you'll use daily: making basic requests, managing multi-turn conversations, using prefill to shape responses, and leveraging vision capabilities. By the end, you'll have a solid foundation for building production-ready applications with Claude.
Basic Request and Response
At its simplest, the Messages API takes a list of messages and returns Claude's response. Each message has a role (either "user" or "assistant") and content.
Here's a minimal example in Python:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message.content[0].text)
The response includes:
- id: Unique identifier for the message
- role: Always
"assistant"for responses - content: Array of content blocks (usually text)
- model: The model used
- stop_reason: Why generation stopped (
"end_turn","max_tokens","stop_sequence", or"tool_use") - usage: Token counts for input and output
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"usage": {"input_tokens": 12, "output_tokens": 6}
}
Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over context but requires you to manage state on your side.
To continue a conversation, append new messages to the history:
import anthropic
client = anthropic.Anthropic()
First turn
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
Second turn: include previous exchange
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": response.content[0].text},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(response.content[0].text)
Synthetic Assistant Messages
You can inject synthetic assistant messages—messages that didn't actually come from Claude. This is useful for:
- Few-shot prompting: Showing examples of desired responses
- Guiding behavior: Demonstrating tone or format
- Correcting context: Providing "correct" answers in history
messages = [
{"role": "user", "content": "What's the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What's the capital of Italy?"}
]
Prefill: Putting Words in Claude's Mouth
Prefill allows you to start Claude's response by including an assistant message with partial content at the end of your input. This is powerful for:
- Constraining output format (e.g., JSON, multiple choice)
- Guiding the start of a response
- Reducing token usage for structured outputs
Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Use structured outputs or system prompt instructions instead.
Example: Multiple Choice Answer
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Outputs: "C"
By setting max_tokens=1 and pre-filling "The answer is (", Claude only needs to output the single letter "C". This is efficient and predictable.
Example: JSON Output
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=200,
messages=[
{
"role": "user",
"content": "Extract the name and age from: John is 30 years old."
},
{
"role": "assistant",
"content": "Here's the JSON:\n{"
}
]
)
Vision Capabilities
The Messages API supports images as input. You can send base64-encoded images or image URLs, and Claude can analyze them.
import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Handling Stop Reasons
Understanding why Claude stopped generating helps you build robust applications:
| stop_reason | Meaning | Action |
|---|---|---|
"end_turn" | Claude finished naturally | Return response to user |
"max_tokens" | Hit token limit | Increase max_tokens or continue |
"stop_sequence" | Hit a custom stop sequence | Handle as needed |
"tool_use" | Claude wants to use a tool | Execute tool and continue |
response = client.messages.create(...)
if response.stop_reason == "max_tokens":
print("Response was truncated. Consider increasing max_tokens.")
elif response.stop_reason == "tool_use":
print("Claude requested a tool call.")
Best Practices
- Manage context window: Keep conversation history within the model's context window. Use techniques like summarization or sliding windows for long conversations.
- Use system prompts: For persistent instructions, use the
systemparameter rather than repeating instructions in every user message.
- Handle errors gracefully: The API may return errors for invalid requests, rate limits, or server issues. Implement retry logic with exponential backoff.
- Monitor token usage: Track
usage.input_tokensandusage.output_tokensto control costs and optimize prompts.
- Stream responses: For better user experience, use streaming to show responses as they're generated.
Conclusion
The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create sophisticated conversational AI applications. Remember that the API is stateless—you manage the conversation history—and that prefill gives you fine-grained control over Claude's output.
Key Takeaways
- The Messages API is stateless; you must send the full conversation history with each request
- Prefill allows you to start Claude's response, useful for constraining output format or guiding behavior
- Vision capabilities let you send images for Claude to analyze alongside text
- Always check
stop_reasonto understand why generation ended and handle appropriately - Monitor token usage to control costs and optimize your prompts for efficiency