Guide2026-05-02

Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude

Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, prefill techniques, and vision. Includes code examples and best practices.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Messages APIClaude APIConversational AIPrefillVision

Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude

Claude's Messages API is the foundation for building custom AI applications. Whether you're creating a chatbot, a content generator, or a vision-enabled assistant, understanding how to work with messages is essential. This guide walks you through the core patterns, from simple requests to advanced techniques like prefill and vision.

Understanding the Messages API vs. Managed Agents

Anthropic offers two primary ways to build with Claude:

Messages API: Direct model prompting access. Best for custom agent loops and fine-grained control.
Claude Managed Agents: Pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.

This guide focuses on the Messages API, which gives you full control over every aspect of the conversation.

Basic Request and Response

Let's start with the simplest interaction: sending a single message to Claude and receiving a response.

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

Response Structure

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to understand:

content: An array of content blocks (text, images, tool use, etc.)
stop_reason: Why the response ended ("end_turn", "max_tokens", "stop_sequence", or "tool_use")
usage: Token counts for billing and optimization

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context.

Python Example: Multi-Turn

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(message.content[0].text)

Important Notes

Conversational history is explicit: You must include all previous messages in each request.
Synthetic assistant messages: Earlier turns don't need to come from Claude. You can inject pre-written assistant responses to guide the conversation.
Role alternation: Messages must alternate between "user" and "assistant" roles. The last message must be from the user.

Prefill Technique: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response for it. This is powerful for:

Forcing structured outputs (e.g., JSON, multiple choice)
Guiding tone or style
Reducing output tokens for specific tasks

Example: Multiple Choice Answer

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Outputs: "C"

Prefill Limitations

Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.

Vision: Working with Images

Claude can analyze images sent via the Messages API. This enables use cases like:

Document analysis
Screenshot interpretation
Visual Q&A

Python Example: Image Analysis

import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Formats

JPEG
PNG
GIF
WebP

Images are resized and compressed by Claude to optimize processing. The API supports up to 20 images per request (or more with certain configurations).

Handling Stop Reasons

Understanding why Claude stopped generating helps you handle the response appropriately:

`stop_reason`	Meaning	Action
`"end_turn"`	Claude finished naturally	Process the response
`"max_tokens"`	Hit the token limit	Increase `max_tokens` or continue the conversation
`"stop_sequence"`	Hit a custom stop sequence	Handle based on your logic
`"tool_use"`	Claude wants to use a tool	Execute the tool and return results

Best Practices

Manage token usage: Track usage.input_tokens and usage.output_tokens to optimize costs.
Use system prompts: For persistent instructions, use the system parameter instead of repeating instructions in every user message.
Handle errors gracefully: Implement retry logic for rate limits and timeouts.
Stream responses: For long outputs, use streaming to improve user experience.
Validate prefill compatibility: Check model support before using prefill.

Key Takeaways

The Messages API is stateless—always send the full conversation history with each request.
Multi-turn conversations require alternating user and assistant messages, with the last message from the user.
Prefill lets you start Claude's response, but check model compatibility (not supported on Opus 4.7, Sonnet 4.6, etc.).
Vision capabilities allow Claude to analyze images sent as base64-encoded data.
Always check stop_reason to determine why the response ended and handle it appropriately.