Guide2026-04-28

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide covers the Claude Messages API: making basic requests, building multi-turn conversations by sending full history, pre-filling Claude's responses for structured output, and using vision with images. You'll get Python and TypeScript examples for each pattern.

Messages APIClaude APIMulti-turn conversationsPrefillVision

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Anthropic offers two primary ways to build with Claude: the Messages API and Claude Managed Agents. The Messages API gives you direct, stateless access to the model—ideal for custom agent loops, fine-grained control, and real-time interactions. This guide walks you through the most common patterns for working with the Messages API, from a simple hello to multi-turn conversations, prefill techniques, and vision capabilities.

Understanding the Messages API

The Messages API is a stateless API: every request must include the full conversation history. This design gives you complete control over context, allowing you to inject synthetic assistant messages, prefill responses, or even restart conversations from any point. The API returns a structured response containing the model's reply, usage statistics, and a stop reason that tells you why generation ended.

Basic Request and Response

Let's start with the simplest possible request—a single user message. Here's how to send it using the Python SDK:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

The response includes the model's reply, metadata, and token usage:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

content: An array of content blocks. For text-only responses, it contains a single block with type: "text".
stop_reason: Indicates why the model stopped. "end_turn" means Claude finished naturally; "max_tokens" means it hit the token limit.
usage: Tracks input and output tokens, essential for cost monitoring and debugging.

Building Multi-Turn Conversations

Because the Messages API is stateless, you must send the entire conversation history with each request. This pattern lets you build up a conversation over multiple turns:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(message.content[0].text)

Important: Earlier turns don't need to come from Claude. You can inject synthetic assistant messages—for example, to set up a scenario or provide context that Claude didn't generate. This is useful for:

Role-playing: Pre-seeding a character's persona.
Context injection: Providing background information without making the user repeat it.
Error recovery: Correcting a previous assistant response.

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
async function main() {
  const message = await client.messages.create({
    model: 'claude-opus-4-7',
    max_tokens: 1024,
    messages: [
      { role: 'user', content: 'Hello, Claude' },
      { role: 'assistant', content: 'Hello!' },
      { role: 'user', content: 'Can you describe LLMs to me?' }
    ]
  });
  
  console.log(message.content[0].text);
}
main();

Putting Words in Claude's Mouth: Prefill Technique

One of the most powerful features of the Messages API is prefilling—you can start Claude's response by including an assistant message with partial content in the input. This shapes the model's output, making it ideal for structured responses, multiple-choice questions, or format control.

Example: Forcing a Single-Character Answer

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is Latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Output: "C"

By setting max_tokens=1 and prefilling with "The answer is (", Claude only needs to output a single character—"C". This pattern is excellent for:

Classification tasks: Get a single label or category.
Multiple-choice QA: Extract the exact answer letter.
Format control: Force JSON prefixes like {"answer": ".

Prefill for JSON Output

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=200,
    messages=[
        {
            "role": "user",
            "content": "Extract the name and age from: John is 30 years old."
        },
        {
            "role": "assistant",
            "content": "Here is the JSON: {"
        }
    ]
)

This nudges Claude to continue in JSON format, reducing the chance of markdown or explanatory text.

Vision Capabilities: Working with Images

The Messages API supports image inputs, enabling Claude to analyze visual content. You can pass images as base64-encoded data or via URLs (where supported).

Sending an Image (Python)

import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Key points:

The content field becomes an array of content blocks.
Image blocks require type: "image" with a source containing type, media_type, and data.
Supported media types: image/jpeg, image/png, image/gif, image/webp.
You can mix text and images in the same message.

Handling Stop Reasons

Every response includes a stop_reason field. Understanding these helps you build robust applications:

Stop Reason	Meaning	Action
`end_turn`	Claude finished naturally	Return the response
`max_tokens`	Output hit the token limit	Increase `max_tokens` or truncate
`stop_sequence`	A custom stop sequence was encountered	Handle as needed
`tool_use`	Claude wants to call a tool	Execute the tool and continue

Best Practices

Always include full history: Since the API is stateless, omitting previous turns loses context.
Use prefilling for structured output: It reduces hallucinations and enforces format compliance.
Monitor token usage: The usage field helps you optimize costs and avoid surprises.
Set appropriate max_tokens: For classification, use 1–5 tokens; for generation, use 500–4096.
Handle max_tokens stop reason: If Claude's response is cut off, consider increasing the limit or making a follow-up request.

Key Takeaways

The Messages API is stateless—you must send the full conversation history with every request, giving you complete control over context.
Prefilling Claude's response with partial content is a powerful technique for structured output, classification, and format control.
Vision capabilities allow you to send images alongside text, enabling multimodal analysis.
Always check stop_reason to understand why generation ended and handle max_tokens appropriately.
Monitor usage tokens to optimize costs and debug unexpected behavior.