GuideBeginner2026-05-06

Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision. Includes code examples in Python and TypeScript.

Quick Answer

This guide teaches you how to build conversational AI with Claude's Messages API, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical Python and TypeScript code examples.

Messages APIClaude APIConversational AIPrefillVision

Introduction

Claude's Messages API is the primary way to interact with Claude programmatically. Whether you're building a custom chatbot, an AI assistant, or integrating Claude into your application, understanding the Messages API is essential. This guide covers the most common patterns—from simple requests to advanced techniques like prefill and vision—so you can get the most out of Claude.

Basic Request and Response

At its core, the Messages API is straightforward: you send a list of messages and receive a response. Here's a minimal example in Python:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

The response includes the model's reply, metadata, and token usage:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

stop_reason: Indicates why the response ended (end_turn means Claude finished naturally).
usage: Tracks input and output tokens for billing and optimization.

Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over the conversation context.

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)

Important: The assistant messages don't have to come from Claude—you can inject synthetic assistant responses to guide the conversation. This is useful for:

Providing example responses
Correcting or redirecting Claude
Simulating multi-turn interactions

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by providing the beginning of its answer. This is powerful for controlling output format, enforcing structure, or getting concise answers.

Example: Multiple Choice Answer

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

Response:

{
  "content": [{"type": "text", "text": "C"}],
  "stop_reason": "max_tokens"
}

Prefill Limitations

Prefill is not supported on these models:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6

Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.

When to Use Prefill

Format control: Force JSON, XML, or specific output structures
Constrained generation: Get single-token answers (yes/no, multiple choice)
Role-playing: Set the tone or persona from the first word

Vision: Sending Images to Claude

Claude can analyze images sent via the Messages API. This enables use cases like document analysis, image description, and visual Q&A.

Base64 Image Example (Python)

import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Formats

JPEG, PNG, GIF, WebP
Maximum size: 100MB per image
Claude processes images at varying resolutions; larger images use more tokens

Vision Best Practices

Combine with text: Always include a text prompt alongside images for best results
Use appropriate resolution: High-resolution images provide more detail but cost more tokens
One image per message: For complex analysis, send images one at a time

Handling Stop Reasons

The stop_reason field tells you why Claude stopped generating. Common values:

Stop Reason	Meaning
`end_turn`	Claude finished naturally
`max_tokens`	Response hit the token limit
`stop_sequence`	Claude encountered a stop sequence
`tool_use`	Claude wants to use a tool

Pro tip: If you see max_tokens, consider increasing max_tokens or breaking your request into smaller chunks.

Streaming Responses

For real-time applications, use streaming to get Claude's response incrementally:

stream = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)
for event in stream:
    if event.type == "content_block_delta":
        print(event.delta.text, end="", flush=True)

Streaming is ideal for:

Chat interfaces with real-time display
Long responses where users expect immediate feedback
Reducing perceived latency

Error Handling

Common API errors and how to handle them:

Error	Cause	Solution
400 Bad Request	Invalid parameters	Check model name, message format
401 Unauthorized	Invalid API key	Verify your API key
429 Rate Limit	Too many requests	Implement exponential backoff
529 Overloaded	Server overload	Retry with delay

import time
from anthropic import Anthropic, APIError, RateLimitError
client = Anthropic()
for attempt in range(3):
    try:
        message = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            messages=[{"role": "user", "content": "Hello"}]
        )
        break
    except RateLimitError:
        time.sleep(2 ** attempt)
    except APIError as e:
        print(f"API error: {e}")
        break

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision, you can create sophisticated AI applications. Remember:

Always send the full conversation history (stateless API)
Use prefill for output control (but check model compatibility)
Stream responses for better user experience
Handle errors gracefully with retries

Key Takeaways

Stateless design: You must send the full conversation history with each request—this gives you complete control over context.
Prefill for precision: Use prefill to control output format and get concise answers, but avoid it on newer models (Opus 4.7, Sonnet 4.6) where it's unsupported.
Vision integration: Claude can analyze images via base64 encoding; always pair images with text prompts for best results.
Stream for speed: Streaming reduces perceived latency and is essential for real-time chat interfaces.
Error handling matters: Implement retry logic with exponential backoff to handle rate limits and server overloads gracefully.