BeClaude
GuideBeginnerAPI2026-05-22

Mastering the Messages API: Build Conversational AI with Claude

Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, prefill techniques, and vision tasks. Includes Python and TypeScript code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications. You'll learn to make basic requests, manage multi-turn conversations, prefill Claude's responses, and handle images. Includes practical code examples in Python and TypeScript.

Messages APIClaude APIConversational AIPrefillVision

Introduction

The Messages API is the core interface for building with Claude. Whether you're creating a chatbot, a document analysis tool, or an AI-powered assistant, understanding how to work with messages is essential. This guide covers everything from basic requests to advanced techniques like prefill and vision.

Understanding the Messages API

Anthropic offers two primary ways to build with Claude:

  • Messages API: Direct model prompting access. Best for custom agent loops and fine-grained control.
  • Claude Managed Agents: Pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
This guide focuses on the Messages API, which gives you full control over every request and response.

Basic Request and Response

Let's start with the simplest possible interaction: sending a single message to Claude and getting a response.

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const message = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 1024, messages: [ { role: 'user', content: 'Hello, Claude' } ] });

console.log(message.content[0].text);

Understanding the Response

The API returns a structured JSON object containing:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-sonnet-4-20250514",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

  • content: An array of content blocks (usually text)
  • stop_reason: Why the model stopped generating (e.g., "end_turn", "max_tokens", "stop_sequence")
  • usage: Token counts for billing and optimization

Multi-Turn Conversations

The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context.

Building a Conversation

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": "Hello!"}, {"role": "user", "content": "Can you describe LLMs to me?"} ] )

print(message.content[0].text)

Important Notes

  • You don't need to use actual Claude responses for assistant messages. You can inject synthetic assistant messages to guide the conversation or provide context.
  • Always alternate between user and assistant roles. The conversation must start with a user message.
  • The entire history counts toward your input token usage, so be mindful of context length.

Putting Words in Claude's Mouth (Prefill)

Prefilling allows you to start Claude's response for it. This is useful for:

  • Forcing structured outputs (e.g., JSON, multiple choice answers)
  • Guiding the tone or style of the response
  • Reducing token usage by constraining the output

Example: Multiple Choice Answer

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # Output: "C"

Prefill Limitations

Important: Prefilling is not supported on Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.

For models that don't support prefill, consider:

  • Structured outputs: Define a JSON schema for Claude to follow
  • System prompt instructions: Use the system parameter to specify output format

Vision: Working with Images

The Messages API supports image inputs, enabling visual understanding and analysis.

Sending an Image

import anthropic
import base64

client = anthropic.Anthropic()

Read and encode image

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] )

print(message.content[0].text)

Supported Image Formats

  • JPEG
  • PNG
  • GIF
  • WebP

Best Practices for Vision

  • Use appropriate resolution: Images up to 8,000x8,000 pixels are supported
  • Combine with text: Always include a text prompt alongside images for best results
  • Consider token cost: Images consume tokens proportional to their size

Handling Stop Reasons

Understanding why Claude stopped generating helps you handle different scenarios:

Stop ReasonMeaningAction
end_turnClaude finished naturallyContinue or end conversation
max_tokensOutput hit token limitIncrease max_tokens or truncate
stop_sequenceCustom stop sequence triggeredHandle as designed
tool_useClaude wants to use a toolExecute tool and return result

Error Handling

Common errors and how to handle them:

import anthropic
from anthropic import APIError, APIConnectionError, RateLimitError

try: message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": "Hello"}] ) except RateLimitError: print("Rate limited. Implement exponential backoff.") except APIConnectionError: print("Network issue. Retry with backoff.") except APIError as e: print(f"API error: {e}")

Streaming Responses

For real-time applications, use streaming to receive tokens as they're generated:

import anthropic

client = anthropic.Anthropic()

with client.messages.stream( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": "Tell me a story"}] ) as stream: for text in stream.text_stream: print(text, end="", flush=True)

Streaming is ideal for chatbots and any application where low latency matters.

Key Takeaways

  • The Messages API is stateless — always send the full conversation history with each request. This gives you complete control over context.
  • Use prefill carefully — it's powerful for constraining outputs but not supported on all models. Consider structured outputs as an alternative.
  • Vision capabilities allow you to send images alongside text prompts for multimodal understanding. Always pair images with descriptive text.
  • Handle stop reasons to build robust applications — end_turn, max_tokens, and tool_use each require different responses.
  • Stream for real-time applications — streaming reduces perceived latency and improves user experience in interactive applications.