GuideBeginnerAPI2026-05-22

Mastering the Messages API: Build Conversational AI with Claude

Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, prefill techniques, and vision tasks. Includes Python and TypeScript code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications. You'll learn to make basic requests, manage multi-turn conversations, prefill Claude's responses, and handle images. Includes practical code examples in Python and TypeScript.

Messages APIClaude APIConversational AIPrefillVision

Introduction

The Messages API is the core interface for building with Claude. Whether you're creating a chatbot, a document analysis tool, or an AI-powered assistant, understanding how to work with messages is essential. This guide covers everything from basic requests to advanced techniques like prefill and vision.

Understanding the Messages API

Anthropic offers two primary ways to build with Claude:

Messages API: Direct model prompting access. Best for custom agent loops and fine-grained control.
Claude Managed Agents: Pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.

This guide focuses on the Messages API, which gives you full control over every request and response.

Basic Request and Response

Let's start with the simplest possible interaction: sending a single message to Claude and getting a response.

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Hello, Claude' }
  ]
});
console.log(message.content[0].text);

Understanding the Response

The API returns a structured JSON object containing:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-sonnet-4-20250514",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

content: An array of content blocks (usually text)
stop_reason: Why the model stopped generating (e.g., "end_turn", "max_tokens", "stop_sequence")
usage: Token counts for billing and optimization

Multi-Turn Conversations

The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context.

Building a Conversation

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(message.content[0].text)

Important Notes

You don't need to use actual Claude responses for assistant messages. You can inject synthetic assistant messages to guide the conversation or provide context.
Always alternate between user and assistant roles. The conversation must start with a user message.
The entire history counts toward your input token usage, so be mindful of context length.

Putting Words in Claude's Mouth (Prefill)

Prefilling allows you to start Claude's response for it. This is useful for:

Forcing structured outputs (e.g., JSON, multiple choice answers)
Guiding the tone or style of the response
Reducing token usage by constraining the output

Example: Multiple Choice Answer

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Output: "C"

Prefill Limitations

Important: Prefilling is not supported on Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.

For models that don't support prefill, consider:

Structured outputs: Define a JSON schema for Claude to follow
System prompt instructions: Use the system parameter to specify output format

Vision: Working with Images

The Messages API supports image inputs, enabling visual understanding and analysis.

Sending an Image

import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Formats

JPEG
PNG
GIF
WebP

Best Practices for Vision

Use appropriate resolution: Images up to 8,000x8,000 pixels are supported
Combine with text: Always include a text prompt alongside images for best results
Consider token cost: Images consume tokens proportional to their size

Handling Stop Reasons

Understanding why Claude stopped generating helps you handle different scenarios:

Stop Reason	Meaning	Action
`end_turn`	Claude finished naturally	Continue or end conversation
`max_tokens`	Output hit token limit	Increase `max_tokens` or truncate
`stop_sequence`	Custom stop sequence triggered	Handle as designed
`tool_use`	Claude wants to use a tool	Execute tool and return result

Error Handling

Common errors and how to handle them:

import anthropic
from anthropic import APIError, APIConnectionError, RateLimitError
try:
    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError:
    print("Rate limited. Implement exponential backoff.")
except APIConnectionError:
    print("Network issue. Retry with backoff.")
except APIError as e:
    print(f"API error: {e}")

Streaming Responses

For real-time applications, use streaming to receive tokens as they're generated:

import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Streaming is ideal for chatbots and any application where low latency matters.

Key Takeaways

The Messages API is stateless — always send the full conversation history with each request. This gives you complete control over context.
Use prefill carefully — it's powerful for constraining outputs but not supported on all models. Consider structured outputs as an alternative.
Vision capabilities allow you to send images alongside text prompts for multimodal understanding. Always pair images with descriptive text.
Handle stop reasons to build robust applications — end_turn, max_tokens, and tool_use each require different responses.
Stream for real-time applications — streaming reduces perceived latency and improves user experience in interactive applications.