Guide2026-05-04

Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude

Learn how to use Claude's Messages API for multi-turn conversations, prefill techniques, and vision. Includes code examples in Python and TypeScript.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Messages APIClaude APIConversational AIPrefillVision

Introduction

Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generator, or an AI assistant, understanding how to structure your requests and handle responses is essential. This guide walks you through the core patterns of the Messages API, from simple one-shot queries to complex multi-turn conversations and advanced techniques like prefill and vision.

Understanding the Messages API

The Messages API is stateless—each request must include the full conversation history. This design gives you complete control over the context Claude sees, making it ideal for custom agent loops and fine-grained control over interactions.

Basic Request and Response

A minimal request requires three things: a model name, a max_tokens limit, and an array of messages. Here's how it looks in Python:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message.content[0].text)

The response includes useful metadata:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

stop_reason: Indicates why the response ended (end_turn, max_tokens, stop_sequence, or tool_use).
usage: Token counts for billing and context management.

Building Multi-Turn Conversations

Since the API is stateless, you must send the entire conversation history with each request. This pattern lets you build up context over multiple turns.

import anthropic
client = anthropic.Anthropic()
conversation = [
    {"role": "user", "content": "Hello, Claude"},
    {"role": "assistant", "content": "Hello!"},
    {"role": "user", "content": "Can you describe LLMs to me?"}
]
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=conversation
)
print(message.content[0].text)

Synthetic Assistant Messages

You're not limited to real conversations. You can inject synthetic assistant messages to guide Claude's behavior. For example, you might pre-populate a conversation with a persona or context:

conversation = [
    {"role": "user", "content": "You are a helpful math tutor. Start by asking me a question."},
    {"role": "assistant", "content": "Sure! Let's start with algebra. What is 2x + 3 = 7?"},
    {"role": "user", "content": "x = 2"}
]

This is particularly useful for:

Setting up role-playing scenarios
Providing few-shot examples
Maintaining character consistency

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response for it. You include an assistant message at the end of your input with partial content, and Claude completes it. This is powerful for constraining outputs.

Example: Multiple Choice Answer

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Outputs: "C"

By setting max_tokens=1, you force Claude to output just the letter. The prefill "The answer is (" guides the model to complete the pattern.

Important Limitations

Prefill is not supported on these models:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6

Requests using prefill with these models return a 400 error. For these models, use structured outputs or system prompt instructions instead.

Use Cases for Prefill

Constrained generation: Force JSON prefixes or specific formats
Chain-of-thought: Start with "Let me think step by step:" to encourage reasoning
Classification: Prefill with a category label
Completion tasks: Provide the beginning of a sentence or code block

Vision: Working with Images

Claude can analyze images sent through the Messages API. This enables use cases like image captioning, document analysis, and visual Q&A.

Sending an Image

Images are sent as base64-encoded data in the content array:

import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Formats

PNG
JPEG
WebP
GIF (static, first frame only)

Best Practices for Vision

Use appropriate resolution: Claude works best with images between 200x200 and 2048x2048 pixels
Compress when possible: Smaller file sizes reduce latency
Combine with text: Always include a text prompt to guide Claude's analysis
One image per message: For complex scenes, send one image at a time

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications:

stop_reason	Meaning	Action
`end_turn`	Claude finished naturally	Continue or end conversation
`max_tokens`	Output hit the token limit	Increase `max_tokens` or truncate
`stop_sequence`	A custom stop sequence was hit	Handle as needed
`tool_use`	Claude wants to call a tool	Execute tool and send result back

Example handling:

if message.stop_reason == "max_tokens":
    print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "tool_use":
    print("Claude requested a tool call.")

Error Handling and Best Practices

Common Errors

400 Bad Request: Invalid parameters or unsupported prefill model
401 Unauthorized: Invalid API key
429 Too Many Requests: Rate limit exceeded
500 Internal Server Error: Temporary server issue

Retry Strategy

import time
from anthropic import Anthropic, APIError
client = Anthropic()
def make_request_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=1024,
                messages=messages
            )
        except APIError as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)

Token Management

Monitor usage.input_tokens and usage.output_tokens to stay within limits
Use prompt caching for repeated system prompts
Consider compaction for long conversations

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision, you can create sophisticated AI applications. Remember that the API is stateless—you control the context. Use prefill wisely (avoiding unsupported models), handle stop reasons appropriately, and always monitor token usage.

Key Takeaways

Stateless design: Always send the full conversation history; you control the context Claude sees.
Prefill is powerful but limited: Use it to constrain outputs, but avoid models that don't support it (Opus 4.7, Sonnet 4.6, etc.).
Vision is straightforward: Send base64-encoded images with a text prompt for analysis.
Handle stop reasons: end_turn, max_tokens, and tool_use each require different responses.
Monitor token usage: Track input and output tokens to manage costs and stay within limits.