BeClaude
GuideBeginnerBest Practices2026-05-21

Mastering the Messages API: A Practical Guide to Building with Claude

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision. Includes code examples and best practices.

Quick Answer

This guide teaches you how to use the Claude Messages API to send requests, manage multi-turn conversations, prefill responses, and work with images. You'll get practical code examples and best practices for building robust AI applications.

Messages APIClaude APIMulti-turn conversationsPrefillVision

Introduction

The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generator, or a complex agent, understanding how to structure your API calls is essential. This guide walks you through the core patterns: basic requests, multi-turn conversations, prefill techniques, and vision capabilities.

Basic Request and Response

At its simplest, a Messages API call requires three things:

  • model: The Claude model you want to use (e.g., claude-opus-4-7)
  • max_tokens: The maximum number of tokens in Claude's response
  • messages: An array of message objects, each with a role and content
Here's a minimal example in Python:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

The response includes:

  • id: Unique message identifier
  • role: Always "assistant"
  • content: Array of content blocks (usually text)
  • stop_reason: Why Claude stopped ("end_turn", "max_tokens", "stop_sequence", or "tool_use")
  • usage: Token counts for input and output
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {"input_tokens": 12, "output_tokens": 6}
}

Multi-Turn Conversations

The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context but requires careful management.

Building a Conversation

To continue a conversation, append new messages to the history:

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"},
    ]
)

Synthetic Assistant Messages

You can inject pre-written assistant messages into the history. This is useful for:

  • Providing examples: Show Claude how you want it to respond
  • Correcting behavior: Insert a corrected response to steer future replies
  • Simulating context: Create scenarios without real interactions
messages = [
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What about Germany?"},
]

Managing Token Limits

Long conversations consume tokens quickly. Consider:

  • Summarizing earlier turns
  • Using prompt caching for repeated system instructions
  • Setting appropriate max_tokens to control response length

Prefill: Putting Words in Claude's Mouth

Prefill lets you start Claude's response by providing the beginning of its answer. This is powerful for:

  • Constraining output format (e.g., JSON, multiple choice)
  • Guiding tone or style
  • Ensuring specific phrasing

Basic Prefill Example

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is Latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

By setting max_tokens=1, Claude only generates the letter "C", giving you a clean multiple-choice answer.

Important Limitations

Prefill is not supported on these models:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.

Migration from Prefill

If you're moving away from prefill, here are alternatives:

Structured outputs (recommended):
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system="You must respond in JSON format with keys: 'answer', 'explanation'",
    messages=[
        {"role": "user", "content": "What is Latin for Ant?"}
    ]
)
System prompt instructions:
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system="Always start your response with 'The answer is: ' followed by the letter of the correct choice.",
    messages=[
        {"role": "user", "content": "What is Latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"}
    ]
)

Vision: Working with Images

Claude can analyze images sent via the Messages API. This enables use cases like:

  • Image captioning
  • Document analysis
  • Visual question answering

Sending an Image

Images are sent as base64-encoded data in the content array:

import base64

with open("diagram.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this diagram in detail." } ] } ] )

Supported Image Formats

FormatMedia Type
PNGimage/png
JPEGimage/jpeg
WebPimage/webp
GIFimage/gif

Best Practices for Vision

  • Use high-resolution images when details matter
  • Combine with text prompts for specific instructions
  • Keep images under 20MB for optimal performance
  • Consider token cost: Images consume significant input tokens

Handling Stop Reasons

Understanding why Claude stopped helps you handle responses correctly:

stop_reasonMeaningAction
"end_turn"Claude finished naturallyReturn response to user
"max_tokens"Response was cut offIncrease max_tokens or continue conversation
"stop_sequence"A custom stop sequence was hitCheck your stop sequences
"tool_use"Claude wants to call a toolExecute the tool and return results

Example: Handling Max Tokens

if message.stop_reason == "max_tokens":
    # Continue the conversation with the partial response
    messages.append({"role": "assistant", "content": message.content[0].text})
    messages.append({"role": "user", "content": "Please continue."})
    message = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        messages=messages
    )

Error Handling

Common API errors and how to handle them:

  • 400 Bad Request: Invalid parameters (e.g., prefill on unsupported model)
  • 401 Unauthorized: Invalid API key
  • 429 Rate Limit: Too many requests — implement exponential backoff
  • 500 Internal Server Error: Temporary issue — retry with backoff
import time
from anthropic import Anthropic, APIError, RateLimitError

client = Anthropic()

for attempt in range(3): try: message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[{"role": "user", "content": "Hello"}] ) break except RateLimitError: time.sleep(2 ** attempt) except APIError as e: print(f"API error: {e}") break

Key Takeaways

  • The Messages API is stateless — always send the full conversation history. Manage context carefully to avoid token waste.
  • Prefill is powerful but limited — use it for constrained outputs, but migrate to structured outputs or system prompts for unsupported models.
  • Vision capabilities let Claude analyze images — combine with text prompts for best results, and be mindful of token costs.
  • Handle stop reasons to build robust applications — especially max_tokens for long responses and tool_use for agent workflows.
  • Implement error handling with retry logic for rate limits and transient errors to ensure reliable API usage.