BeClaude
Guide2026-04-27

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples in Python and TypeScript.

Quick Answer

This guide teaches you how to use Claude's Messages API to build stateless multi-turn conversations, prefill assistant responses for structured outputs, and integrate vision capabilities with image inputs.

Messages APIClaude APImulti-turn conversationsprefillvision

Introduction

Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, an agent, or a content generation tool, understanding how to structure requests and handle responses is essential. This guide walks you through the core patterns—from simple prompts to complex multi-turn conversations and vision tasks.

Basic Request and Response

At its simplest, a Messages API call sends a user message and receives an assistant reply. Here's a minimal example in Python:

import anthropic

client = anthropic.Anthropic() message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] ) print(message)

The response includes the model's reply, metadata, and token usage:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

  • stop_reason: Indicates why the model stopped ("end_turn" for natural completion, "max_tokens" if it hit the limit).
  • usage: Tracks input and output tokens for billing and optimization.

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage state on your end.

Example: Two-Turn Conversation

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)

Notice that the assistant's previous reply ("Hello!") is included as a synthetic message. This pattern allows you to:

  • Continue conversations across multiple API calls.
  • Inject pre-written assistant responses (e.g., from a database or fallback logic).
  • Control the flow of dialogue without relying on Claude's memory.

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

async function main() { const message = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 1024, messages: [ { role: 'user', content: 'Hello, Claude' }, { role: 'assistant', content: 'Hello!' }, { role: 'user', content: 'Can you describe LLMs to me?' } ] }); console.log(message); }

main();

Prefilling Claude's Response

A powerful technique is to prefill part of Claude's response by including an assistant message at the end of the input. This shapes the model's output, especially useful for structured responses or multiple-choice questions.

Example: Multiple Choice with Prefill

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

By prefilling "The answer is (", Claude will complete with a single token—likely "C". Setting max_tokens=1 ensures a concise answer. This pattern works well for:

  • Classification tasks
  • Yes/no questions
  • Extracting specific data points

Important Considerations

  • Prefilling does not count toward output tokens, but it does consume input tokens.
  • The model may override your prefill if it contradicts the context—use it to guide, not force.
  • For longer prefills, ensure consistency with the user's request to avoid confusion.

Handling Stop Reasons

Every response includes a stop_reason field. Understanding these helps you build robust applications:

Stop ReasonMeaningAction
"end_turn"Claude finished naturallyContinue or end conversation
"max_tokens"Output hit the token limitIncrease max_tokens or truncate
"stop_sequence"A custom stop sequence was triggeredHandle as designed
"tool_use"Claude wants to call a toolProcess tool call and continue
Example: If you get "max_tokens", you can retry with a higher limit or split the response.

Vision Capabilities

The Messages API supports image inputs for vision tasks. You can send images as base64-encoded data or via URLs.

Sending an Image (Python)

import anthropic
import base64

client = anthropic.Anthropic()

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] ) print(message.content[0].text)

Supported media types: image/jpeg, image/png, image/gif, image/webp. Maximum image size is 5MB.

Streaming Responses

For real-time applications, streaming reduces perceived latency. Use the stream parameter:

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream: if chunk.type == "content_block_delta": print(chunk.delta.text, end="", flush=True)

Streaming is ideal for chatbots, code assistants, and any UI that benefits from incremental output.

Best Practices

  • Manage Context Windows: Keep conversation history within Claude's context window (200K tokens for most models). Use summarization or sliding windows for long conversations.
  • Use Prompt Caching: For repeated system prompts or large context, enable prompt caching to reduce costs and latency.
  • Handle Errors Gracefully: Implement retry logic for rate limits and timeouts. The API returns standard HTTP status codes.
  • Optimize Token Usage: Prefill sparingly, trim unnecessary history, and use max_tokens wisely to avoid waste.
  • Test with Different Models: Claude Opus, Sonnet, and Haiku have different speed/cost profiles. Choose based on your use case.

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefilling, and vision, you can create sophisticated applications that leverage Claude's full capabilities. Remember that the API is stateless—you own the conversation state, which gives you maximum flexibility.

Key Takeaways

  • Stateless design: Always send full conversation history; manage state on your end.
  • Prefill for control: Use assistant messages at the end of input to guide Claude's responses, especially for structured outputs.
  • Monitor stop reasons: Handle "end_turn", "max_tokens", and "tool_use" appropriately in your application logic.
  • Vision is straightforward: Send images as base64 or URLs with a text prompt for analysis.
  • Stream for UX: Use streaming to reduce perceived latency in interactive applications.