BeClaude
Guide2026-04-29

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Learn how to use the Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide covers the Messages API for building conversational AI apps with Claude, including stateless multi-turn conversations, prefill techniques to shape responses, and vision support for image analysis.

Messages APIClaude APIMulti-Turn ConversationsPrefillVision

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Claude's Messages API is the primary interface for building conversational AI applications. Whether you're creating a chatbot, a code assistant, or a document analysis tool, understanding how to structure requests and manage conversation state is essential.

This guide walks you through the core patterns of the Messages API—from basic requests to advanced techniques like prefill and vision—with practical code examples you can use immediately.

Understanding the Messages API vs. Managed Agents

Anthropic offers two ways to build with Claude:

  • Messages API: Direct model prompting access. Best for custom agent loops and fine-grained control.
  • Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
This guide focuses on the Messages API, which gives you full control over every request and response.

Basic Request and Response

At its simplest, the Messages API takes a list of messages and returns Claude's response. Here's a minimal example in Python:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

Understanding the Response

The API returns a structured JSON object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

  • content: An array of content blocks (text, tool_use, etc.)
  • stop_reason: Why Claude stopped generating (end_turn, max_tokens, stop_sequence, or tool_use)
  • usage: Token counts for billing and monitoring

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage state on your end.

Example: Two-Turn Conversation

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)

Notice that the assistant's previous response ("Hello!") is included in the messages array. This is how Claude maintains context across turns.

Important: Synthetic Assistant Messages

Earlier conversational turns don't need to actually originate from Claude. You can inject synthetic assistant messages to guide the conversation or provide context. For example:

messages = [
    {"role": "user", "content": "What's the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "Tell me more about its landmarks."}
]

This is useful for:

  • Providing example interactions in few-shot prompting
  • Correcting or steering the conversation history
  • Building multi-step reasoning chains

Putting Words in Claude's Mouth: The Prefill Technique

One of the most powerful features of the Messages API is prefilling—you can start Claude's response by including an assistant message with partial content in the last position of the input messages list.

Use Case: Constrained Multiple Choice

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

Output:

{
  "content": [{"type": "text", "text": "C"}],
  "stop_reason": "max_tokens"
}

By setting max_tokens=1 and prefilling with "The answer is (", we force Claude to complete only the letter. This is perfect for:

  • Classification tasks
  • Multiple-choice questions
  • Yes/no decisions
  • Structured output extraction

Use Case: Shaping Response Style

You can also prefill to control tone or format:

messages = [
    {"role": "user", "content": "Explain quantum computing in one sentence."},
    {"role": "assistant", "content": "Quantum computing is a revolutionary approach that "}
]

This ensures Claude continues your thought rather than starting from scratch.

Vision Capabilities: Working with Images

The Messages API supports image inputs, enabling Claude to analyze visual content. Here's how to send an image:

import base64

with open("diagram.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this diagram in detail." } ] } ] )

Supported Image Formats

  • JPEG, PNG, GIF, WebP
  • Maximum size: 100 MB per image
  • Claude processes images at various resolutions for optimal performance

Handling Stop Reasons

Understanding stop_reason is crucial for building robust applications:

stop_reasonMeaningAction
end_turnClaude finished naturallyReturn response to user
max_tokensOutput exceeded token limitIncrease max_tokens or split response
stop_sequenceA custom stop sequence was hitHandle as needed
tool_useClaude wants to call a toolExecute tool and continue conversation

Example: Handling max_tokens

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=100,
    messages=[{"role": "user", "content": "Write a long essay on AI."}]
)

if response.stop_reason == "max_tokens": print("Response was truncated. Consider increasing max_tokens.")

Streaming for Real-Time Responses

For a better user experience, use streaming to receive tokens as they're generated:

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True
)

for chunk in stream: if chunk.type == "content_block_delta": print(chunk.delta.text, end="", flush=True)

Streaming is essential for:

  • Chat interfaces with real-time token display
  • Long responses where users expect immediate feedback
  • Reducing perceived latency

Best Practices

  • Manage context windows carefully: Token limits apply per request. Use the usage field to monitor consumption.
  • Use synthetic messages for few-shot prompting: Inject example assistant responses to demonstrate desired behavior.
  • Prefill for structured outputs: When you need JSON, XML, or specific formats, prefill the opening tags.
  • Handle errors gracefully: Always check stop_reason and implement retry logic for transient failures.
  • Optimize with prompt caching: For repeated system prompts, use prompt caching to reduce costs and latency.

Key Takeaways

  • The Messages API is stateless—you must send full conversation history with each request, giving you complete control over context.
  • Prefill techniques let you shape Claude's responses by starting its reply, enabling constrained outputs like multiple-choice answers or structured data.
  • Vision support allows Claude to analyze images sent as base64-encoded data, opening up document analysis and visual reasoning use cases.
  • Always check the stop_reason field to determine why Claude stopped and handle truncation or tool calls appropriately.
  • Streaming provides real-time token delivery for better user experiences in chat applications.
Ready to build? Start with a simple request, then layer in multi-turn conversations, prefilling, and streaming as your application grows.