BeClaude
GuideBeginner2026-05-06

Mastering the Messages API: A Practical Guide to Building with Claude

Learn how to use Claude's Messages API for multi-turn conversations, response prefilling, and vision capabilities. Includes code examples and best practices for developers.

Quick Answer

This guide teaches you how to use Claude's Messages API for basic requests, multi-turn conversations, response prefilling, and vision capabilities with practical code examples in Python and TypeScript.

Messages APIClaude APIMulti-turn conversationsPrefillVision

Mastering the Messages API: A Practical Guide to Building with Claude

Anthropic offers two primary ways to build with Claude: the Messages API for direct model access and Claude Managed Agents for pre-built agent harnesses. This guide focuses on the Messages API—the foundation for custom agent loops, fine-grained control, and integrating Claude into your applications.

Whether you're building a chatbot, content generator, or vision-powered tool, understanding the Messages API is essential. Let's dive into the patterns that will help you get the most out of Claude.

Understanding the Messages API

The Messages API is a stateless, RESTful interface that lets you send conversational turns to Claude and receive responses. Unlike some chat APIs that maintain session state, you must send the full conversation history with every request. This design gives you complete control over context management.

Basic Request Structure

Every request to the Messages API requires three core parameters:

  • model: The Claude model identifier (e.g., claude-opus-4-7, claude-sonnet-4-5)
  • max_tokens: Maximum tokens in the response
  • messages: An array of message objects with role and content
Here's a minimal example in Python:
import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

The response includes the model's reply, usage statistics, and a stop_reason indicating why generation ended:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Building Multi-Turn Conversations

Since the Messages API is stateless, you build conversations by appending each turn to the messages array. This pattern allows you to maintain context across multiple exchanges.

The Conversation Loop Pattern

import anthropic

client = anthropic.Anthropic()

Start with the initial user message

messages = [ {"role": "user", "content": "What are the three primary colors?"} ]

First API call

response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=messages )

Append Claude's response to history

messages.append({"role": "assistant", "content": response.content[0].text})

Add the next user turn

messages.append({"role": "user", "content": "Can you mix them to make secondary colors?"})

Second API call with full history

response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=messages )

print(response.content[0].text)

Synthetic Assistant Messages

A powerful feature: earlier assistant turns don't need to come from Claude. You can inject synthetic assistant messages to guide the conversation or provide context that Claude didn't generate:

messages = [
    {"role": "user", "content": "Summarize our previous discussion about project timelines."},
    {"role": "assistant", "content": "Based on our discussion, the project has three phases: research (weeks 1-2), development (weeks 3-6), and testing (weeks 7-8)."},
    {"role": "user", "content": "What are the key milestones for the development phase?"}
]

This is particularly useful for:

  • Injecting system-generated context
  • Simulating conversation history from other sources
  • Providing structured data summaries

Putting Words in Claude's Mouth: Prefill Technique

The prefill technique lets you start Claude's response by including assistant content in the input messages. This shapes the model's output by providing a starting point.

Use Case: Multiple Choice Questions

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, # Only need one token for the answer messages=[ { "role": "user", "content": "What is the capital of France?\nA) London\nB) Paris\nC) Berlin\nD) Madrid\n\nAnswer:" }, { "role": "assistant", "content": "B" # Prefill the answer } ] )

print(message.content[0].text) # Output: "B"

Important Prefill Limitations

Prefilling is not supported on these models:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
Requests using prefill with these models return a 400 error. For these models, use structured outputs or system prompt instructions instead.

When to Use Prefill

  • Constrained outputs: Force Claude to start with a specific format (JSON, YAML, etc.)
  • Multiple choice: Get single-token answers for classification tasks
  • Controlled generation: Guide the tone or direction of the response
  • Chain-of-thought prompting: Start Claude's reasoning process

Vision Capabilities: Sending Images to Claude

Claude can analyze images alongside text. You can supply images using three source types:

  • base64: Base64-encoded image data
  • url: Publicly accessible image URL
  • file: Image uploaded via the Files API

Supported Image Formats

FormatMIME Type
JPEGimage/jpeg
PNGimage/png
GIFimage/gif
WebPimage/webp

Example: Analyzing an Image from URL

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "text", "text": "What's in this image?" }, { "type": "image", "source": { "type": "url", "url": "https://example.com/ant-photo.jpg" } } ] } ] )

print(message.content[0].text)

Output: "This image shows an ant, specifically a close-up view..."

Example: Using Base64 Images

import anthropic
import base64

client = anthropic.Anthropic()

with open("photo.jpg", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in detail." }, { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": image_data } } ] } ] )

Handling Stop Reasons

Every response includes a stop_reason field that tells you why Claude stopped generating. Understanding these helps you build robust applications:

Stop ReasonMeaningAction
end_turnClaude finished naturallyContinue or end conversation
max_tokensHit the token limitIncrease max_tokens or truncate response
stop_sequenceFound a custom stop sequenceHandle based on your logic
tool_useClaude wants to use a toolProcess the tool call and continue
response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=100,
    messages=[{"role": "user", "content": "Write a 500-word essay"}]
)

if response.stop_reason == "max_tokens": print("Response was truncated. Consider increasing max_tokens.") elif response.stop_reason == "end_turn": print("Response completed successfully.")

Best Practices for the Messages API

1. Manage Token Usage

Monitor the usage field in responses to track costs and optimize your prompts:

print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")

2. Use Prompt Caching for Long Histories

For conversations with extensive context, enable prompt caching to reduce costs and latency on repeated prefixes.

3. Handle Errors Gracefully

Always implement retry logic with exponential backoff for API errors:

import time
from anthropic import Anthropic, APIError

client = Anthropic() max_retries = 3

for attempt in range(max_retries): try: response = client.messages.create(...) break except APIError as e: if attempt == max_retries - 1: raise time.sleep(2 ** attempt)

4. Stream Responses for Better UX

For long responses, use streaming to show tokens as they're generated:

with client.messages.stream(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Key Takeaways

  • Stateless by design: Always send the full conversation history with each request. This gives you complete control over context management.
  • Prefill shapes responses: Use synthetic assistant messages to guide Claude's output, but be aware of model limitations—prefill is not supported on Opus 4.7, Sonnet 4.6, and others.
  • Vision is straightforward: Send images via base64, URL, or file reference using the image content type block. Supported formats are JPEG, PNG, GIF, and WebP.
  • Monitor stop reasons: The stop_reason field tells you why generation ended—use it to handle truncation, tool calls, or natural completion.
  • Stream for better UX: Use streaming for real-time token display, especially for long responses or chat applications.