Mastering the Messages API: Building Conversational AI with Claude
Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide covers how to use the Claude Messages API to build conversational AI applications, including basic requests, multi-turn conversations, prefill techniques, and vision capabilities with Python and TypeScript code examples.
Introduction
The Claude Messages API is the primary interface for building conversational AI applications with Anthropic's Claude models. Whether you're creating a simple chatbot or a complex multi-turn assistant, understanding how to work with messages effectively is essential.
This guide covers the core patterns you'll use daily: basic requests, managing conversation history, pre-filling responses, and working with images. By the end, you'll have a solid foundation for building production-ready applications with Claude.
Basic Request and Response
At its simplest, the Messages API takes a list of messages and returns Claude's response. Here's the minimal example in Python:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
The response includes several important fields:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
content: An array of content blocks (usually text, but can include tool use blocks)stop_reason: Why Claude stopped generating ("end_turn","max_tokens","stop_sequence", or"tool_use")usage: Token counts for billing and context window management
Multi-Turn Conversations
The Messages API is stateless — each request must include the full conversation history. This gives you complete control over context but requires you to manage the conversation state on your end.
Here's how to build a multi-turn conversation:
import anthropic
client = anthropic.Anthropic()
First turn
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
Extract Claude's response
assistant_response = message.content[0].text
Second turn: include the full history
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": assistant_response},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message.content[0].text)
Synthetic Assistant Messages
You can inject synthetic assistant messages into the history. This is useful for:
- Providing few-shot examples
- Guiding conversation flow
- Implementing system-like behavior without the system prompt
messages = [
{"role": "user", "content": "What's the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What about Italy?"}
]
Prefill: Putting Words in Claude's Mouth
Prefilling allows you to start Claude's response for it. This is powerful for:
- Enforcing response format (e.g., JSON, multiple choice)
- Guiding the tone or structure
- Reducing output tokens for constrained tasks
Basic Prefill Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # "C"
By setting max_tokens=1 and prefilling with "The answer is (", Claude only needs to output the letter. This is perfect for multiple-choice classification tasks.
Prefill Limitations
Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.
For models that don't support prefill, consider:
- Structured outputs: Define a JSON schema for the response
- System prompt instructions: Use clear formatting instructions in the system prompt
Working with Images (Vision)
Claude can analyze images sent via the Messages API. This enables use cases like document analysis, screenshot interpretation, and visual question answering.
Sending an Image
import anthropic
import base64
client = anthropic.Anthropic()
Read and encode the image
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported media types: image/jpeg, image/png, image/gif, image/webp.
Image Size Limits
Claude processes images at different resolutions depending on size:
- Images under 1,950 pixels on the longest side are processed at original resolution
- Larger images are scaled down to fit within 1,950 pixels
- Very large images (over 8,000 pixels) may be rejected
Handling Stop Reasons
Understanding stop_reason helps you build robust applications:
| Stop Reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue conversation |
max_tokens | Output hit token limit | Increase max_tokens or truncate |
stop_sequence | Custom stop sequence triggered | Handle as needed |
tool_use | Claude wants to call a tool | Execute tool and continue |
if message.stop_reason == "max_tokens":
print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "tool_use":
print("Claude requested a tool call.")
# Handle tool execution...
Best Practices
1. Manage Token Usage
Always check usage.input_tokens and usage.output_tokens to track costs. For long conversations, consider:
- Summarizing older messages
- Using prompt caching for repeated system instructions
- Trimming history when approaching context limits
2. Handle Errors Gracefully
try:
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
except anthropic.APIError as e:
print(f"API error: {e}")
# Implement retry logic or fallback
except anthropic.APIConnectionError as e:
print(f"Connection error: {e}")
# Retry after delay
3. Use Streaming for Responsive UIs
For chat applications, use streaming to show tokens as they're generated:
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Conclusion
The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision capabilities, you can create sophisticated conversational AI applications. Remember that the API is stateless — you manage the conversation history — and always check stop reasons to handle different scenarios appropriately.
Key Takeaways
- The Messages API is stateless — always send the full conversation history with each request
- Prefill allows you to guide Claude's responses by starting its reply, but check model compatibility
- Vision capabilities let Claude analyze images sent as base64-encoded data
- Always check
stop_reasonto understand why Claude stopped generating and handle edge cases - Use streaming for real-time user interfaces and track token usage to manage costs