GuideBeginner2026-05-06

Mastering the Messages API: A Practical Guide to Building with Claude

Learn how to use Claude's Messages API for multi-turn conversations, response prefilling, and vision capabilities. Includes code examples and best practices for developers.

Quick Answer

This guide teaches you how to use Claude's Messages API for basic requests, multi-turn conversations, response prefilling, and vision capabilities with practical code examples in Python and TypeScript.

Messages APIClaude APIMulti-turn conversationsPrefillVision

Mastering the Messages API: A Practical Guide to Building with Claude

Anthropic offers two primary ways to build with Claude: the Messages API for direct model access and Claude Managed Agents for pre-built agent harnesses. This guide focuses on the Messages API—the foundation for custom agent loops, fine-grained control, and integrating Claude into your applications.

Whether you're building a chatbot, content generator, or vision-powered tool, understanding the Messages API is essential. Let's dive into the patterns that will help you get the most out of Claude.

Understanding the Messages API

The Messages API is a stateless, RESTful interface that lets you send conversational turns to Claude and receive responses. Unlike some chat APIs that maintain session state, you must send the full conversation history with every request. This design gives you complete control over context management.

Basic Request Structure

Every request to the Messages API requires three core parameters:

model: The Claude model identifier (e.g., claude-opus-4-7, claude-sonnet-4-5)
max_tokens: Maximum tokens in the response
messages: An array of message objects with role and content

Here's a minimal example in Python:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

The response includes the model's reply, usage statistics, and a stop_reason indicating why generation ended:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Building Multi-Turn Conversations

Since the Messages API is stateless, you build conversations by appending each turn to the messages array. This pattern allows you to maintain context across multiple exchanges.

The Conversation Loop Pattern

import anthropic
client = anthropic.Anthropic()
Start with the initial user message
messages = [
    {"role": "user", "content": "What are the three primary colors?"}
]
First API call
response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=messages
)
Append Claude's response to history
messages.append({"role": "assistant", "content": response.content[0].text})
Add the next user turn
messages.append({"role": "user", "content": "Can you mix them to make secondary colors?"})
Second API call with full history
response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=messages
)
print(response.content[0].text)

Synthetic Assistant Messages

A powerful feature: earlier assistant turns don't need to come from Claude. You can inject synthetic assistant messages to guide the conversation or provide context that Claude didn't generate:

messages = [
    {"role": "user", "content": "Summarize our previous discussion about project timelines."},
    {"role": "assistant", "content": "Based on our discussion, the project has three phases: research (weeks 1-2), development (weeks 3-6), and testing (weeks 7-8)."},
    {"role": "user", "content": "What are the key milestones for the development phase?"}
]

This is particularly useful for:

Injecting system-generated context
Simulating conversation history from other sources
Providing structured data summaries

Putting Words in Claude's Mouth: Prefill Technique

The prefill technique lets you start Claude's response by including assistant content in the input messages. This shapes the model's output by providing a starting point.

Use Case: Multiple Choice Questions

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,  # Only need one token for the answer
    messages=[
        {
            "role": "user", 
            "content": "What is the capital of France?\nA) London\nB) Paris\nC) Berlin\nD) Madrid\n\nAnswer:"
        },
        {
            "role": "assistant",
            "content": "B"  # Prefill the answer
        }
    ]
)
print(message.content[0].text)  # Output: "B"

Important Prefill Limitations

Prefilling is not supported on these models:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6

Requests using prefill with these models return a 400 error. For these models, use structured outputs or system prompt instructions instead.

When to Use Prefill

Constrained outputs: Force Claude to start with a specific format (JSON, YAML, etc.)
Multiple choice: Get single-token answers for classification tasks
Controlled generation: Guide the tone or direction of the response
Chain-of-thought prompting: Start Claude's reasoning process

Vision Capabilities: Sending Images to Claude

Claude can analyze images alongside text. You can supply images using three source types:

base64: Base64-encoded image data
url: Publicly accessible image URL
file: Image uploaded via the Files API

Supported Image Formats

Format	MIME Type
JPEG	`image/jpeg`
PNG	`image/png`
GIF	`image/gif`
WebP	`image/webp`

Example: Analyzing an Image from URL

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's in this image?"
                },
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://example.com/ant-photo.jpg"
                    }
                }
            ]
        }
    ]
)
print(message.content[0].text)
Output: "This image shows an ant, specifically a close-up view..."

Example: Using Base64 Images

import anthropic
import base64
client = anthropic.Anthropic()
with open("photo.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe this image in detail."
                },
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_data
                    }
                }
            ]
        }
    ]
)

Handling Stop Reasons

Every response includes a stop_reason field that tells you why Claude stopped generating. Understanding these helps you build robust applications:

Stop Reason	Meaning	Action
`end_turn`	Claude finished naturally	Continue or end conversation
`max_tokens`	Hit the token limit	Increase `max_tokens` or truncate response
`stop_sequence`	Found a custom stop sequence	Handle based on your logic
`tool_use`	Claude wants to use a tool	Process the tool call and continue

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=100,
    messages=[{"role": "user", "content": "Write a 500-word essay"}]
)
if response.stop_reason == "max_tokens":
    print("Response was truncated. Consider increasing max_tokens.")
elif response.stop_reason == "end_turn":
    print("Response completed successfully.")

Best Practices for the Messages API

1. Manage Token Usage

Monitor the usage field in responses to track costs and optimize your prompts:

print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")

2. Use Prompt Caching for Long Histories

For conversations with extensive context, enable prompt caching to reduce costs and latency on repeated prefixes.

3. Handle Errors Gracefully

Always implement retry logic with exponential backoff for API errors:

import time
from anthropic import Anthropic, APIError
client = Anthropic()
max_retries = 3
for attempt in range(max_retries):
    try:
        response = client.messages.create(...)
        break
    except APIError as e:
        if attempt == max_retries - 1:
            raise
        time.sleep(2 ** attempt)

4. Stream Responses for Better UX

For long responses, use streaming to show tokens as they're generated:

with client.messages.stream(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Key Takeaways

Stateless by design: Always send the full conversation history with each request. This gives you complete control over context management.
Prefill shapes responses: Use synthetic assistant messages to guide Claude's output, but be aware of model limitations—prefill is not supported on Opus 4.7, Sonnet 4.6, and others.
Vision is straightforward: Send images via base64, URL, or file reference using the image content type block. Supported formats are JPEG, PNG, GIF, and WebP.
Monitor stop reasons: The stop_reason field tells you why generation ended—use it to handle truncation, tool calls, or natural completion.
Stream for better UX: Use streaming for real-time token display, especially for long responses or chat applications.