Mastering the Messages API: A Practical Guide to Building with Claude
Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision. Includes code examples and best practices.
This guide teaches you how to use the Claude Messages API to send requests, manage multi-turn conversations, prefill responses, and work with images. You'll get practical code examples and best practices for building robust AI applications.
Introduction
The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generator, or a complex agent, understanding how to structure your API calls is essential. This guide walks you through the core patterns: basic requests, multi-turn conversations, prefill techniques, and vision capabilities.
Basic Request and Response
At its simplest, a Messages API call requires three things:
- model: The Claude model you want to use (e.g.,
claude-opus-4-7) - max_tokens: The maximum number of tokens in Claude's response
- messages: An array of message objects, each with a
roleandcontent
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
The response includes:
- id: Unique message identifier
- role: Always
"assistant" - content: Array of content blocks (usually text)
- stop_reason: Why Claude stopped (
"end_turn","max_tokens","stop_sequence", or"tool_use") - usage: Token counts for input and output
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {"input_tokens": 12, "output_tokens": 6}
}
Multi-Turn Conversations
The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context but requires careful management.
Building a Conversation
To continue a conversation, append new messages to the history:
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"},
]
)
Synthetic Assistant Messages
You can inject pre-written assistant messages into the history. This is useful for:
- Providing examples: Show Claude how you want it to respond
- Correcting behavior: Insert a corrected response to steer future replies
- Simulating context: Create scenarios without real interactions
messages = [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What about Germany?"},
]
Managing Token Limits
Long conversations consume tokens quickly. Consider:
- Summarizing earlier turns
- Using prompt caching for repeated system instructions
- Setting appropriate
max_tokensto control response length
Prefill: Putting Words in Claude's Mouth
Prefill lets you start Claude's response by providing the beginning of its answer. This is powerful for:
- Constraining output format (e.g., JSON, multiple choice)
- Guiding tone or style
- Ensuring specific phrasing
Basic Prefill Example
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is Latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
By setting max_tokens=1, Claude only generates the letter "C", giving you a clean multiple-choice answer.
Important Limitations
Prefill is not supported on these models:
- Claude Mythos Preview
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
Migration from Prefill
If you're moving away from prefill, here are alternatives:
Structured outputs (recommended):message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
system="You must respond in JSON format with keys: 'answer', 'explanation'",
messages=[
{"role": "user", "content": "What is Latin for Ant?"}
]
)
System prompt instructions:
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
system="Always start your response with 'The answer is: ' followed by the letter of the correct choice.",
messages=[
{"role": "user", "content": "What is Latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"}
]
)
Vision: Working with Images
Claude can analyze images sent via the Messages API. This enables use cases like:
- Image captioning
- Document analysis
- Visual question answering
Sending an Image
Images are sent as base64-encoded data in the content array:
import base64
with open("diagram.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this diagram in detail."
}
]
}
]
)
Supported Image Formats
| Format | Media Type |
|---|---|
| PNG | image/png |
| JPEG | image/jpeg |
| WebP | image/webp |
| GIF | image/gif |
Best Practices for Vision
- Use high-resolution images when details matter
- Combine with text prompts for specific instructions
- Keep images under 20MB for optimal performance
- Consider token cost: Images consume significant input tokens
Handling Stop Reasons
Understanding why Claude stopped helps you handle responses correctly:
| stop_reason | Meaning | Action |
|---|---|---|
"end_turn" | Claude finished naturally | Return response to user |
"max_tokens" | Response was cut off | Increase max_tokens or continue conversation |
"stop_sequence" | A custom stop sequence was hit | Check your stop sequences |
"tool_use" | Claude wants to call a tool | Execute the tool and return results |
Example: Handling Max Tokens
if message.stop_reason == "max_tokens":
# Continue the conversation with the partial response
messages.append({"role": "assistant", "content": message.content[0].text})
messages.append({"role": "user", "content": "Please continue."})
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=messages
)
Error Handling
Common API errors and how to handle them:
- 400 Bad Request: Invalid parameters (e.g., prefill on unsupported model)
- 401 Unauthorized: Invalid API key
- 429 Rate Limit: Too many requests — implement exponential backoff
- 500 Internal Server Error: Temporary issue — retry with backoff
import time
from anthropic import Anthropic, APIError, RateLimitError
client = Anthropic()
for attempt in range(3):
try:
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
break
except RateLimitError:
time.sleep(2 ** attempt)
except APIError as e:
print(f"API error: {e}")
break
Key Takeaways
- The Messages API is stateless — always send the full conversation history. Manage context carefully to avoid token waste.
- Prefill is powerful but limited — use it for constrained outputs, but migrate to structured outputs or system prompts for unsupported models.
- Vision capabilities let Claude analyze images — combine with text prompts for best results, and be mindful of token costs.
- Handle stop reasons to build robust applications — especially
max_tokensfor long responses andtool_usefor agent workflows. - Implement error handling with retry logic for rate limits and transient errors to ensure reliable API usage.