GuideBeginnerBest Practices2026-05-21

Mastering the Messages API: A Practical Guide to Building with Claude

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision. Includes code examples and best practices.

Quick Answer

This guide teaches you how to use the Claude Messages API to send requests, manage multi-turn conversations, prefill responses, and work with images. You'll get practical code examples and best practices for building robust AI applications.

Messages APIClaude APIMulti-turn conversationsPrefillVision

Introduction

The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generator, or a complex agent, understanding how to structure your API calls is essential. This guide walks you through the core patterns: basic requests, multi-turn conversations, prefill techniques, and vision capabilities.

Basic Request and Response

At its simplest, a Messages API call requires three things:

model: The Claude model you want to use (e.g., claude-opus-4-7)
max_tokens: The maximum number of tokens in Claude's response
messages: An array of message objects, each with a role and content

Here's a minimal example in Python:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

The response includes:

id: Unique message identifier
role: Always "assistant"
content: Array of content blocks (usually text)
stop_reason: Why Claude stopped ("end_turn", "max_tokens", "stop_sequence", or "tool_use")
usage: Token counts for input and output

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {"input_tokens": 12, "output_tokens": 6}
}

Multi-Turn Conversations

The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context but requires careful management.

Building a Conversation

To continue a conversation, append new messages to the history:

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"},
    ]
)

Synthetic Assistant Messages

You can inject pre-written assistant messages into the history. This is useful for:

Providing examples: Show Claude how you want it to respond
Correcting behavior: Insert a corrected response to steer future replies
Simulating context: Create scenarios without real interactions

messages = [
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What about Germany?"},
]

Managing Token Limits

Long conversations consume tokens quickly. Consider:

Summarizing earlier turns
Using prompt caching for repeated system instructions
Setting appropriate max_tokens to control response length

Prefill: Putting Words in Claude's Mouth

Prefill lets you start Claude's response by providing the beginning of its answer. This is powerful for:

Constraining output format (e.g., JSON, multiple choice)
Guiding tone or style
Ensuring specific phrasing

Basic Prefill Example

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is Latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

By setting max_tokens=1, Claude only generates the letter "C", giving you a clean multiple-choice answer.

Important Limitations

Prefill is not supported on these models:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6

Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.

Migration from Prefill

If you're moving away from prefill, here are alternatives:

Structured outputs (recommended):

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system="You must respond in JSON format with keys: 'answer', 'explanation'",
    messages=[
        {"role": "user", "content": "What is Latin for Ant?"}
    ]
)

System prompt instructions:

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system="Always start your response with 'The answer is: ' followed by the letter of the correct choice.",
    messages=[
        {"role": "user", "content": "What is Latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"}
    ]
)

Vision: Working with Images

Claude can analyze images sent via the Messages API. This enables use cases like:

Image captioning
Document analysis
Visual question answering

Sending an Image

Images are sent as base64-encoded data in the content array:

import base64
with open("diagram.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this diagram in detail."
                }
            ]
        }
    ]
)

Supported Image Formats

Format	Media Type
PNG	`image/png`
JPEG	`image/jpeg`
WebP	`image/webp`
GIF	`image/gif`

Best Practices for Vision

Use high-resolution images when details matter
Combine with text prompts for specific instructions
Keep images under 20MB for optimal performance
Consider token cost: Images consume significant input tokens

Handling Stop Reasons

Understanding why Claude stopped helps you handle responses correctly:

stop_reason	Meaning	Action
`"end_turn"`	Claude finished naturally	Return response to user
`"max_tokens"`	Response was cut off	Increase `max_tokens` or continue conversation
`"stop_sequence"`	A custom stop sequence was hit	Check your stop sequences
`"tool_use"`	Claude wants to call a tool	Execute the tool and return results

Example: Handling Max Tokens

if message.stop_reason == "max_tokens":
    # Continue the conversation with the partial response
    messages.append({"role": "assistant", "content": message.content[0].text})
    messages.append({"role": "user", "content": "Please continue."})
    message = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        messages=messages
    )

Error Handling

Common API errors and how to handle them:

400 Bad Request: Invalid parameters (e.g., prefill on unsupported model)
401 Unauthorized: Invalid API key
429 Rate Limit: Too many requests — implement exponential backoff
500 Internal Server Error: Temporary issue — retry with backoff

import time
from anthropic import Anthropic, APIError, RateLimitError
client = Anthropic()
for attempt in range(3):
    try:
        message = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=1024,
            messages=[{"role": "user", "content": "Hello"}]
        )
        break
    except RateLimitError:
        time.sleep(2 ** attempt)
    except APIError as e:
        print(f"API error: {e}")
        break

Key Takeaways

The Messages API is stateless — always send the full conversation history. Manage context carefully to avoid token waste.
Prefill is powerful but limited — use it for constrained outputs, but migrate to structured outputs or system prompts for unsupported models.
Vision capabilities let Claude analyze images — combine with text prompts for best results, and be mindful of token costs.
Handle stop reasons to build robust applications — especially max_tokens for long responses and tool_use for agent workflows.
Implement error handling with retry logic for rate limits and transient errors to ensure reliable API usage.