GuideBeginnerAPI2026-05-18

Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision

Learn how to build with Claude using the Messages API. Covers basic requests, multi-turn conversations, prefill techniques, and vision capabilities with code examples.

Quick Answer

This guide teaches you how to use the Claude Messages API for basic requests, multi-turn conversations, prefill to shape responses, and vision capabilities to analyze images.

Messages APIClaude APIPrefillVisionConversational AI

Introduction

The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generator, or a vision-powered application, understanding the Messages API is essential. This guide walks you through the most common patterns: basic requests, multi-turn conversations, prefill techniques, and vision capabilities.

Basic Request and Response

At its simplest, a Messages API call sends a user message and receives Claude's response. Here's a minimal example in Python:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

The response includes:

id: A unique message identifier
content: An array of content blocks (usually text)
stop_reason: Why the generation stopped (end_turn, max_tokens, stop_sequence, etc.)
usage: Token counts for input and output

Example output:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {"input_tokens": 12, "output_tokens": 6}
}

Multi-Turn Conversations

The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context but requires you to manage state on your end.

Building a Conversation

To continue a conversation, append new messages to the messages array:

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)

Synthetic Assistant Messages

You can inject synthetic assistant messages — they don't need to be actual Claude responses. This is useful for:

Setting up a scenario or persona
Providing example interactions (few-shot prompting)
Guiding Claude's behavior without system prompts

messages = [
    {"role": "user", "content": "What's the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What about Italy?"}
]

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by including an assistant message at the end of your input. This shapes the output — Claude will continue from where you left off.

Use Case: Multiple Choice

A common pattern is using prefill with max_tokens=1 to get a single-character answer:

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

Claude will complete the response with C, giving you a clean, parseable answer.

Important Notes

Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Using it with these models returns a 400 error.
For unsupported models, use structured outputs or system prompt instructions instead.
Prefill works best for short, constrained outputs like classifications or single tokens.

Vision: Analyzing Images

Claude can analyze images sent via the Messages API. This enables use cases like document analysis, image description, and visual Q&A.

Sending an Image

Images are sent as base64-encoded data in a content block:

import base64
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)

Supported Image Formats

JPEG, PNG, GIF, WebP
Maximum size: 100 MB per image
Claude processes images at varying resolutions; larger images may be downscaled

Best Practices for Vision

Combine with text: Always include a text prompt alongside the image to guide Claude's analysis.
Use high-quality images: Blurry or low-resolution images reduce accuracy.
Be specific: Instead of "What's in this image?", ask "What are the quarterly sales trends shown in this bar chart?"
Consider token cost: Images consume input tokens based on their resolution. A 1024x1024 image uses roughly 1,500 tokens.

Handling Stop Reasons

Claude's response includes a stop_reason field that tells you why generation stopped:

Stop Reason	Meaning
`end_turn`	Claude finished naturally
`max_tokens`	Output hit the `max_tokens` limit
`stop_sequence`	Claude encountered a custom stop sequence
`tool_use`	Claude wants to call a tool (if tools are enabled)

For max_tokens, you can continue the conversation by appending Claude's partial response and asking it to continue:

# If stop_reason is "max_tokens", continue the conversation
if message.stop_reason == "max_tokens":
    messages.append({"role": "assistant", "content": message.content[0].text})
    messages.append({"role": "user", "content": "Please continue."})
    # Make another API call

Streaming Responses

For real-time applications, use streaming to receive tokens as they're generated:

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)
for event in stream:
    if event.type == "content_block_delta":
        print(event.delta.text, end="", flush=True)

Streaming is ideal for chatbots, live transcription, and any UI that shows incremental progress.

Error Handling

Common API errors and how to handle them:

400 Bad Request: Invalid parameters (e.g., unsupported model with prefill)
401 Unauthorized: Invalid API key
429 Too Many Requests: Rate limit exceeded — implement exponential backoff
529 Overloaded: Temporary server overload — retry with backoff

import time
import random
def call_with_retry(client, **kwargs):
    max_retries = 5
    for attempt in range(max_retries):
        try:
            return client.messages.create(**kwargs)
        except anthropic.RateLimitError:
            wait = 2 ** attempt + random.uniform(0, 1)
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Conclusion

The Messages API is the foundation for all Claude integrations. By mastering basic requests, multi-turn conversations, prefill, and vision, you can build sophisticated applications that leverage Claude's full capabilities. Remember that the API is stateless — manage conversation history on your end — and always handle stop reasons and errors gracefully.

Key Takeaways

The Messages API is stateless — you must send the full conversation history with every request, giving you complete control over context.
Prefill lets you shape responses by starting Claude's reply, but it's not supported on all models (use structured outputs as an alternative).
Vision capabilities allow Claude to analyze images sent as base64 data; always pair images with specific text prompts for best results.
Streaming provides real-time token delivery, ideal for interactive applications.
Handle stop reasons like max_tokens to gracefully continue interrupted responses.