GuideBeginnerAPI2026-05-22

Mastering the Messages API: Building Conversational AI with Claude

Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Messages APIClaude APIConversational AIPrefillVision

Introduction

Claude's Messages API is the primary interface for building conversational AI applications. Whether you're creating a simple chatbot, a complex agent system, or a vision-enabled application, understanding how to work with messages is essential. This guide covers everything from basic requests to advanced techniques like prefill and multi-turn conversations.

Understanding the Messages API

The Messages API is a stateless, RESTful API that accepts a list of messages and returns a model-generated response. Unlike some other APIs, you must send the full conversation history with each request. This design gives you complete control over the conversation context.

Key Concepts

Messages: An array of conversation turns, each with a role (user or assistant) and content.
Roles: user for human messages, assistant for Claude's responses.
Stateless: Each request is independent; you manage conversation state on your end.
Stop Reasons: Indicates why Claude stopped generating (e.g., end_turn, max_tokens, stop_sequence).

Basic Request and Response

Let's start with the simplest possible request: a single user message.

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
  model: 'claude-sonnet-4-5',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Hello, Claude' }
  ]
});
console.log(message.content[0].text);

Response Structure

The API returns a structured response:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I help you today?"
    }
  ],
  "model": "claude-sonnet-4-5",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields:

content: An array of content blocks (text, tool_use, etc.)
stop_reason: Why generation stopped (end_turn, max_tokens, stop_sequence, tool_use)
usage: Token counts for billing and context management

Building Multi-Turn Conversations

Since the Messages API is stateless, you must send the entire conversation history with each request. This pattern enables rich, context-aware interactions.

Python Example

import anthropic
client = anthropic.Anthropic()
First turn
message1 = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
Second turn - include previous messages
message2 = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": message1.content[0].text},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(message2.content[0].text)

Important Notes

Synthetic Messages: You can inject synthetic assistant messages (e.g., from a database or previous session) to continue conversations seamlessly.
Context Window: Be mindful of the context window limit. Each turn adds tokens to the input.
Message Order: Messages must alternate between user and assistant roles, starting with user.

Prefill Technique: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by providing the beginning of its answer. This is useful for:

Guiding response format (e.g., JSON, multiple choice)
Enforcing specific phrasing
Reducing token usage for constrained outputs

Important: Model Support

Prefill is not supported on:

Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6
Claude Mythos Preview

For these models, use structured outputs or system prompt instructions instead.

Python Example: Multiple Choice

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,  # Only need one token for the answer
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Outputs: "C"

How Prefill Works

The assistant message in the last position contains your prefill text.
Claude continues generating from that point.
Combined with max_tokens, you can get very constrained outputs.

Best Practices for Prefill

Use with max_tokens: Set a low max_tokens value to limit Claude's completion.
Natural Continuation: Make the prefill text a natural lead-in to the desired response.
Fallback Strategy: For unsupported models, use system prompts like "Always respond with a single letter A, B, or C."

Vision Capabilities

Claude can process images through the Messages API. This enables use cases like image analysis, document processing, and visual question answering.

Python Example

import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("diagram.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this diagram in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Media Types

image/jpeg
image/png
image/gif (first frame only)
image/webp

Image Size Limits

Maximum image size: 100 MB
Claude automatically resizes large images to fit its context window
For best results, use images under 5 MB

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications.

Stop Reason	Meaning	Action
`end_turn`	Claude finished naturally	Continue conversation
`max_tokens`	Output hit token limit	Increase `max_tokens` or truncate
`stop_sequence`	Hit a custom stop sequence	Handle as needed
`tool_use`	Claude wants to use a tool	Execute tool and continue

Python Example: Handling Stop Reasons

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a 2000-word essay on AI"}
    ]
)
if message.stop_reason == "max_tokens":
    print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "end_turn":
    print("Response completed successfully.")

Error Handling

Common API errors and how to handle them:

400 Bad Request: Invalid parameters (e.g., prefill on unsupported model)
401 Unauthorized: Invalid API key
429 Rate Limit: Too many requests; implement exponential backoff
500 Internal Server Error: Transient server issue; retry with backoff

Python Example: Retry with Backoff

import anthropic
import time
client = anthropic.Anthropic()
max_retries = 3
for attempt in range(max_retries):
    try:
        message = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            messages=[{"role": "user", "content": "Hello"}]
        )
        break
    except anthropic.RateLimitError:
        if attempt < max_retries - 1:
            time.sleep(2 ** attempt)  # Exponential backoff
        else:
            raise

Best Practices

Manage Context Window: Track token usage and implement conversation summarization for long conversations.
Use System Prompts: For consistent behavior, use the system parameter (not shown here but available).
Handle Streaming: For real-time applications, use streaming to get tokens as they're generated.
Cache Prompts: For repeated system prompts or large context, use prompt caching to reduce costs.
Monitor Usage: Track input_tokens and output_tokens for billing and optimization.

Conclusion

The Messages API is the foundation for building conversational AI with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create sophisticated applications that leverage Claude's full potential. Remember that the API is stateless, so you control the conversation context—giving you maximum flexibility.

Key Takeaways

The Messages API is stateless; you must send the full conversation history with each request for multi-turn conversations.
Prefill allows you to guide Claude's responses by providing the beginning of its answer, but it's not supported on all models.
Vision capabilities enable image analysis by sending base64-encoded images in the content array.
Always handle stop reasons (end_turn, max_tokens, tool_use) to build robust applications.
Implement proper error handling with exponential backoff for rate limits and transient errors.