Mastering the Messages API: Build Conversational AI with Claude
Learn how to use the Claude Messages API for single-turn queries, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide teaches you to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques to shape responses, and vision capabilities, with code examples in Python and TypeScript.
Introduction
The Claude Messages API is the primary interface for building conversational AI applications with Anthropic's Claude models. Whether you're creating a chatbot, a content generation tool, or an intelligent assistant, understanding how to work with messages is essential.
This guide covers the core patterns for using the Messages API effectively: from simple single-turn requests to complex multi-turn conversations, prefill techniques for controlling responses, and leveraging vision capabilities. By the end, you'll have a solid foundation for building production-ready applications with Claude.
Basic Request and Response
At its simplest, the Messages API accepts a list of messages and returns a response. Here's a minimal example in Python:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message.content[0].text)
And the equivalent in TypeScript:
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude' }
]
});
console.log(message.content[0].text);
The API response includes:
- id: Unique identifier for the message
- role: Always "assistant" for responses
- content: Array of content blocks (typically text)
- model: The model used
- stop_reason: Why generation stopped (e.g., "end_turn", "max_tokens")
- usage: Token counts for input and output
Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over context but requires you to manage state on your end.
Building a Conversation
To continue a conversation, simply append new messages to the history:
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello! How can I help you today?"},
{"role": "user", "content": "Can you explain how transformers work?"}
]
)
Synthetic Assistant Messages
You can insert synthetic assistant messages—responses that didn't actually come from Claude. This is useful for:
- Providing examples: Show Claude the format you want
- Correcting behavior: Guide Claude toward a specific response style
- Simulating scenarios: Test how Claude handles different situations
messages = [
{"role": "user", "content": "Summarize this article"},
{"role": "assistant", "content": "I'll provide a concise summary with bullet points."},
{"role": "user", "content": "Here's the article: ..."}
]
Managing Conversation History
For long conversations, be mindful of token limits. Strategies include:
- Summarization: Periodically summarize older turns
- Sliding window: Keep only the most recent N messages
- Selective retention: Keep system messages and recent exchanges, drop older ones
Prefill: Putting Words in Claude's Mouth
Prefill allows you to start Claude's response by providing the beginning of its answer. This powerful technique lets you shape the response format, enforce structure, or guide Claude toward specific outputs.
Basic Prefill Example
Here's how to use prefill to get a single-letter answer from a multiple-choice question:
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is the Latin name for ant? (A) Apoidea (B) Rhopalocera (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message.content[0].text) # Output: "C"
Practical Applications
- JSON extraction: Prefill with
{"result":to get structured JSON output - Format enforcement: Start with
Here's your summary:to ensure a summary format - Chain-of-thought: Prefill with
Let me think step by step:to encourage reasoning
Important Notes
- Prefill is not supported on Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, or Claude Mythos Preview
- For these models, use structured outputs or system prompt instructions instead
- When using prefill, set
max_tokensappropriately to leave room for completion
Vision Capabilities
Claude can process images through the Messages API. This enables use cases like image analysis, document processing, and visual question answering.
Sending an Image
import base64
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail"
}
]
}
]
)
print(message.content[0].text)
Supported Image Formats
- PNG, JPEG, WEBP, GIF (non-animated)
- Maximum size: ~100MB (but smaller is better for performance)
- Claude can extract text from images, analyze diagrams, and describe visual content
Handling Stop Reasons
The stop_reason field tells you why Claude stopped generating. Understanding these helps you handle different scenarios:
| stop_reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue conversation |
max_tokens | Hit token limit | Increase max_tokens or split response |
stop_sequence | Found a stop sequence | Handle as needed |
tool_use | Claude wants to use a tool | Execute tool and continue |
Example: Handling max_tokens
if message.stop_reason == "max_tokens":
# Continue the conversation to get more content
messages.append({"role": "assistant", "content": message.content[0].text})
messages.append({"role": "user", "content": "Please continue"})
# Make another API call
Best Practices
1. Manage Token Usage
- Monitor
usage.input_tokensandusage.output_tokensin responses - Use shorter conversation histories when possible
- Consider prompt caching for repeated system prompts
2. Handle Errors Gracefully
try:
message = client.messages.create(...)
except anthropic.APIError as e:
print(f"API error: {e}")
# Implement retry logic or fallback
except anthropic.RateLimitError as e:
print(f"Rate limited: {e}")
# Wait and retry
except anthropic.APIConnectionError as e:
print(f"Connection error: {e}")
# Retry with backoff
3. Use System Messages for Instructions
For persistent instructions, use the system parameter instead of repeating instructions in every user message:
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant that always responds in French.",
messages=[
{"role": "user", "content": "Hello"}
]
)
4. Streaming for Better UX
For long responses, use streaming to show output incrementally:
stream = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a long story"}],
stream=True
)
for chunk in stream:
if chunk.type == "content_block_delta":
print(chunk.delta.text, end="", flush=True)
Conclusion
The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create sophisticated AI applications that leverage Claude's full potential.
Remember that the API is stateless—you manage conversation history. Use prefill judiciously for response shaping, and always handle stop reasons and errors appropriately. With these patterns, you're ready to build production-ready conversational AI.
Key Takeaways
- The Messages API is stateless: You must send the full conversation history with each request, giving you complete control over context.
- Prefill shapes responses: Starting Claude's response lets you enforce formats, extract structured data, and guide behavior—but check model compatibility.
- Vision capabilities are powerful: Send images as base64-encoded data for analysis, description, and text extraction.
- Handle stop reasons: Different stop reasons (end_turn, max_tokens, tool_use) require different handling strategies.
- Stream for better UX: Use streaming to show responses incrementally and improve user experience.