BeClaude
GuideBeginnerAPI2026-05-22

Mastering the Messages API: Building Conversational AI with Claude

Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Messages APIClaude APIConversational AIPrefillVision

Introduction

Claude's Messages API is the primary interface for building conversational AI applications. Whether you're creating a simple chatbot, a complex agent system, or a vision-enabled application, understanding how to work with messages is essential. This guide covers everything from basic requests to advanced techniques like prefill and multi-turn conversations.

Understanding the Messages API

The Messages API is a stateless, RESTful API that accepts a list of messages and returns a model-generated response. Unlike some other APIs, you must send the full conversation history with each request. This design gives you complete control over the conversation context.

Key Concepts

  • Messages: An array of conversation turns, each with a role (user or assistant) and content.
  • Roles: user for human messages, assistant for Claude's responses.
  • Stateless: Each request is independent; you manage conversation state on your end.
  • Stop Reasons: Indicates why Claude stopped generating (e.g., end_turn, max_tokens, stop_sequence).

Basic Request and Response

Let's start with the simplest possible request: a single user message.

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const message = await client.messages.create({ model: 'claude-sonnet-4-5', max_tokens: 1024, messages: [ { role: 'user', content: 'Hello, Claude' } ] });

console.log(message.content[0].text);

Response Structure

The API returns a structured response:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I help you today?"
    }
  ],
  "model": "claude-sonnet-4-5",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields:

  • content: An array of content blocks (text, tool_use, etc.)
  • stop_reason: Why generation stopped (end_turn, max_tokens, stop_sequence, tool_use)
  • usage: Token counts for billing and context management

Building Multi-Turn Conversations

Since the Messages API is stateless, you must send the entire conversation history with each request. This pattern enables rich, context-aware interactions.

Python Example

import anthropic

client = anthropic.Anthropic()

First turn

message1 = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

Second turn - include previous messages

message2 = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": message1.content[0].text}, {"role": "user", "content": "Can you describe LLMs to me?"} ] )

print(message2.content[0].text)

Important Notes

  • Synthetic Messages: You can inject synthetic assistant messages (e.g., from a database or previous session) to continue conversations seamlessly.
  • Context Window: Be mindful of the context window limit. Each turn adds tokens to the input.
  • Message Order: Messages must alternate between user and assistant roles, starting with user.

Prefill Technique: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by providing the beginning of its answer. This is useful for:

  • Guiding response format (e.g., JSON, multiple choice)
  • Enforcing specific phrasing
  • Reducing token usage for constrained outputs

Important: Model Support

Prefill is not supported on:

  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
  • Claude Mythos Preview
For these models, use structured outputs or system prompt instructions instead.

Python Example: Multiple Choice

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, # Only need one token for the answer messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # Outputs: "C"

How Prefill Works

  • The assistant message in the last position contains your prefill text.
  • Claude continues generating from that point.
  • Combined with max_tokens, you can get very constrained outputs.

Best Practices for Prefill

  • Use with max_tokens: Set a low max_tokens value to limit Claude's completion.
  • Natural Continuation: Make the prefill text a natural lead-in to the desired response.
  • Fallback Strategy: For unsupported models, use system prompts like "Always respond with a single letter A, B, or C."

Vision Capabilities

Claude can process images through the Messages API. This enables use cases like image analysis, document processing, and visual question answering.

Python Example

import anthropic
import base64

client = anthropic.Anthropic()

Read and encode image

with open("diagram.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this diagram in detail." } ] } ] )

print(message.content[0].text)

Supported Media Types

  • image/jpeg
  • image/png
  • image/gif (first frame only)
  • image/webp

Image Size Limits

  • Maximum image size: 100 MB
  • Claude automatically resizes large images to fit its context window
  • For best results, use images under 5 MB

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications.

Stop ReasonMeaningAction
end_turnClaude finished naturallyContinue conversation
max_tokensOutput hit token limitIncrease max_tokens or truncate
stop_sequenceHit a custom stop sequenceHandle as needed
tool_useClaude wants to use a toolExecute tool and continue

Python Example: Handling Stop Reasons

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ {"role": "user", "content": "Write a 2000-word essay on AI"} ] )

if message.stop_reason == "max_tokens": print("Response was truncated. Consider increasing max_tokens.") elif message.stop_reason == "end_turn": print("Response completed successfully.")

Error Handling

Common API errors and how to handle them:

  • 400 Bad Request: Invalid parameters (e.g., prefill on unsupported model)
  • 401 Unauthorized: Invalid API key
  • 429 Rate Limit: Too many requests; implement exponential backoff
  • 500 Internal Server Error: Transient server issue; retry with backoff

Python Example: Retry with Backoff

import anthropic
import time

client = anthropic.Anthropic()

max_retries = 3 for attempt in range(max_retries): try: message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[{"role": "user", "content": "Hello"}] ) break except anthropic.RateLimitError: if attempt < max_retries - 1: time.sleep(2 ** attempt) # Exponential backoff else: raise

Best Practices

  • Manage Context Window: Track token usage and implement conversation summarization for long conversations.
  • Use System Prompts: For consistent behavior, use the system parameter (not shown here but available).
  • Handle Streaming: For real-time applications, use streaming to get tokens as they're generated.
  • Cache Prompts: For repeated system prompts or large context, use prompt caching to reduce costs.
  • Monitor Usage: Track input_tokens and output_tokens for billing and optimization.

Conclusion

The Messages API is the foundation for building conversational AI with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create sophisticated applications that leverage Claude's full potential. Remember that the API is stateless, so you control the conversation context—giving you maximum flexibility.

Key Takeaways

  • The Messages API is stateless; you must send the full conversation history with each request for multi-turn conversations.
  • Prefill allows you to guide Claude's responses by providing the beginning of its answer, but it's not supported on all models.
  • Vision capabilities enable image analysis by sending base64-encoded images in the content array.
  • Always handle stop reasons (end_turn, max_tokens, tool_use) to build robust applications.
  • Implement proper error handling with exponential backoff for rate limits and transient errors.