BeClaude
Guide2026-05-04

Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude

Learn how to use Claude's Messages API for multi-turn conversations, prefill techniques, and vision. Includes code examples in Python and TypeScript.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Messages APIClaude APIConversational AIPrefillVision

Introduction

Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generator, or an AI assistant, understanding how to structure your requests and handle responses is essential. This guide walks you through the core patterns of the Messages API, from simple one-shot queries to complex multi-turn conversations and advanced techniques like prefill and vision.

Understanding the Messages API

The Messages API is stateless—each request must include the full conversation history. This design gives you complete control over the context Claude sees, making it ideal for custom agent loops and fine-grained control over interactions.

Basic Request and Response

A minimal request requires three things: a model name, a max_tokens limit, and an array of messages. Here's how it looks in Python:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message.content[0].text)

The response includes useful metadata:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

  • stop_reason: Indicates why the response ended (end_turn, max_tokens, stop_sequence, or tool_use).
  • usage: Token counts for billing and context management.

Building Multi-Turn Conversations

Since the API is stateless, you must send the entire conversation history with each request. This pattern lets you build up context over multiple turns.

import anthropic

client = anthropic.Anthropic()

conversation = [ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": "Hello!"}, {"role": "user", "content": "Can you describe LLMs to me?"} ]

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=conversation )

print(message.content[0].text)

Synthetic Assistant Messages

You're not limited to real conversations. You can inject synthetic assistant messages to guide Claude's behavior. For example, you might pre-populate a conversation with a persona or context:

conversation = [
    {"role": "user", "content": "You are a helpful math tutor. Start by asking me a question."},
    {"role": "assistant", "content": "Sure! Let's start with algebra. What is 2x + 3 = 7?"},
    {"role": "user", "content": "x = 2"}
]

This is particularly useful for:

  • Setting up role-playing scenarios
  • Providing few-shot examples
  • Maintaining character consistency

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response for it. You include an assistant message at the end of your input with partial content, and Claude completes it. This is powerful for constraining outputs.

Example: Multiple Choice Answer

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # Outputs: "C"

By setting max_tokens=1, you force Claude to output just the letter. The prefill "The answer is (" guides the model to complete the pattern.

Important Limitations

Prefill is not supported on these models:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
Requests using prefill with these models return a 400 error. For these models, use structured outputs or system prompt instructions instead.

Use Cases for Prefill

  • Constrained generation: Force JSON prefixes or specific formats
  • Chain-of-thought: Start with "Let me think step by step:" to encourage reasoning
  • Classification: Prefill with a category label
  • Completion tasks: Provide the beginning of a sentence or code block

Vision: Working with Images

Claude can analyze images sent through the Messages API. This enables use cases like image captioning, document analysis, and visual Q&A.

Sending an Image

Images are sent as base64-encoded data in the content array:

import anthropic
import base64

client = anthropic.Anthropic()

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] )

print(message.content[0].text)

Supported Image Formats

  • PNG
  • JPEG
  • WebP
  • GIF (static, first frame only)

Best Practices for Vision

  • Use appropriate resolution: Claude works best with images between 200x200 and 2048x2048 pixels
  • Compress when possible: Smaller file sizes reduce latency
  • Combine with text: Always include a text prompt to guide Claude's analysis
  • One image per message: For complex scenes, send one image at a time

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications:

stop_reasonMeaningAction
end_turnClaude finished naturallyContinue or end conversation
max_tokensOutput hit the token limitIncrease max_tokens or truncate
stop_sequenceA custom stop sequence was hitHandle as needed
tool_useClaude wants to call a toolExecute tool and send result back
Example handling:
if message.stop_reason == "max_tokens":
    print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "tool_use":
    print("Claude requested a tool call.")

Error Handling and Best Practices

Common Errors

  • 400 Bad Request: Invalid parameters or unsupported prefill model
  • 401 Unauthorized: Invalid API key
  • 429 Too Many Requests: Rate limit exceeded
  • 500 Internal Server Error: Temporary server issue

Retry Strategy

import time
from anthropic import Anthropic, APIError

client = Anthropic()

def make_request_with_retry(messages, max_retries=3): for attempt in range(max_retries): try: return client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=messages ) except APIError as e: if attempt == max_retries - 1: raise time.sleep(2 ** attempt)

Token Management

  • Monitor usage.input_tokens and usage.output_tokens to stay within limits
  • Use prompt caching for repeated system prompts
  • Consider compaction for long conversations

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision, you can create sophisticated AI applications. Remember that the API is stateless—you control the context. Use prefill wisely (avoiding unsupported models), handle stop reasons appropriately, and always monitor token usage.

Key Takeaways

  • Stateless design: Always send the full conversation history; you control the context Claude sees.
  • Prefill is powerful but limited: Use it to constrain outputs, but avoid models that don't support it (Opus 4.7, Sonnet 4.6, etc.).
  • Vision is straightforward: Send base64-encoded images with a text prompt for analysis.
  • Handle stop reasons: end_turn, max_tokens, and tool_use each require different responses.
  • Monitor token usage: Track input and output tokens to manage costs and stay within limits.