Mastering the Claude Messages API: From Basic Requests to Advanced Patterns
Learn how to use the Claude Messages API effectively—covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.
This guide teaches you how to work with the Claude Messages API, including stateless multi-turn conversations, prefill techniques to shape responses, and vision capabilities for image analysis.
Introduction
Anthropic offers two primary ways to build with Claude: the Messages API for direct model access and Claude Managed Agents for pre-built, configurable agent harnesses. This guide focuses on the Messages API, giving you fine-grained control over your interactions with Claude.
Whether you're building a chatbot, an analysis tool, or a complex agent loop, understanding the Messages API patterns is essential. Let's dive into the core concepts and practical patterns you'll use every day.
Basic Request and Response
At its simplest, a Messages API call sends a user message and receives Claude's response. Here's the canonical example in Python:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"}
]
)
print(message)
Response:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello!"}],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}
Key fields to note:
content: An array of content blocks (usually text).stop_reason: Why Claude stopped—"end_turn"means the model finished naturally.usage: Token counts for billing and optimization.
Multi-Turn Conversations
The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage the history yourself.
Building a Conversation
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"}
]
)
print(message)
Synthetic Assistant Messages
You can inject synthetic assistant messages—they don't have to come from Claude. This is useful for:
- Providing examples (few-shot prompting)
- Guiding conversation flow
- Simulating multi-step reasoning
messages = [
{"role": "user", "content": "What is 2+2?"},
{"role": "assistant", "content": "4"}, # synthetic example
{"role": "user", "content": "What is 5+3?"}
]
Prefill: Putting Words in Claude's Mouth
Prefilling lets you start Claude's response by providing the beginning of its answer. This is powerful for:
- Constraining output format (e.g., JSON, multiple choice)
- Setting tone or style
- Reducing token waste
Example: Multiple Choice with Single Token
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
},
{
"role": "assistant",
"content": "The answer is ("
}
]
)
print(message)
Response:
{
"content": [{"type": "text", "text": "C"}],
"stop_reason": "max_tokens"
}
Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, or Claude Sonnet 4.6. Use structured outputs or system prompt instructions instead.
Prefill for JSON Output
messages = [
{"role": "user", "content": "Extract the name and age from: John is 30 years old."},
{"role": "assistant", "content": "{\"name\": \""}
]
This forces Claude to start with a JSON object, making parsing more reliable.
Vision Capabilities
The Messages API supports image inputs, enabling Claude to analyze visual content. You can pass images as base64-encoded data or as URLs.
Example: Image Analysis
import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this chart in detail."
}
]
}
]
)
print(message.content[0].text)
Supported media types: image/jpeg, image/png, image/gif, image/webp.
Handling Stop Reasons
Understanding stop_reason helps you build robust applications:
| Stop Reason | Meaning | Action |
|---|---|---|
end_turn | Claude finished naturally | Continue or end conversation |
max_tokens | Hit token limit | Increase max_tokens or continue |
stop_sequence | Custom stop sequence triggered | Handle as designed |
tool_use | Claude wants to call a tool | Execute tool and return result |
Streaming Responses
For real-time applications, use streaming to receive tokens as they're generated:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Best Practices
- Manage context windows: Keep conversation history within Claude's context limit (200K tokens for most models).
- Use prompt caching: For repeated system prompts or large contexts, enable prompt caching to reduce costs and latency.
- Handle errors gracefully: Implement retry logic for rate limits and network issues.
- Monitor token usage: Track
usage.input_tokensandusage.output_tokensto optimize your prompts. - Use structured outputs: For reliable JSON parsing, prefer structured outputs over prefill when possible.
Key Takeaways
- The Messages API is stateless—always send the full conversation history with each request.
- Prefill lets you shape Claude's responses by providing the beginning of its answer, but check model compatibility.
- Vision capabilities allow image analysis via base64 or URL inputs.
- Streaming enables real-time token-by-token output for better user experience.
- Monitor stop reasons and token usage to build robust, cost-effective applications.