BeClaude
GuideBeginnerAPI2026-05-12

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational applications, including sending basic requests, managing multi-turn dialogues, pre-filling responses, and processing images.

Messages APIClaude APImulti-turn conversationsprefillvision

Introduction

The Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a document analysis tool, or a creative writing assistant, understanding how to structure your API calls is essential. This guide covers the core patterns you'll use daily: basic requests, multi-turn conversations, prefill techniques, and vision capabilities.

Basic Request and Response

At its simplest, a Messages API call requires three things:

  • model: The Claude model you want to use (e.g., claude-opus-4-7)
  • max_tokens: The maximum number of tokens in Claude's response
  • messages: An array of message objects, each with a role and content
Here's a minimal example in Python:

import anthropic

client = anthropic.Anthropic() message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] ) print(message)

The response includes:

  • id: Unique message identifier
  • role: Always "assistant"
  • content: Array of content blocks (typically text)
  • model: The model used
  • stop_reason: Why generation stopped ("end_turn", "max_tokens", etc.)
  • usage: Token counts for input and output
Example output:
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {"input_tokens": 12, "output_tokens": 6}
}

Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over context but requires careful management.

Building a Conversation

To continue a conversation, append both Claude's previous response and the user's new message to the messages array:

import anthropic

client = anthropic.Anthropic()

First turn

messages = [ {"role": "user", "content": "What is the capital of France?"} ]

response = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=messages )

Add Claude's response to history

messages.append({"role": "assistant", "content": response.content[0].text})

Add user's follow-up

messages.append({"role": "user", "content": "What about Italy?"})

Second turn

response = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=messages )

print(response.content[0].text)

Synthetic Assistant Messages

You can inject pre-written assistant messages into the history. This is useful for:

  • Setting up a scenario or persona
  • Providing example responses (few-shot prompting)
  • Correcting or editing Claude's past responses
messages = [
    {"role": "user", "content": "Explain quantum computing in simple terms."},
    {"role": "assistant", "content": "Quantum computing uses qubits that can be 0 and 1 simultaneously, unlike classical bits."},
    {"role": "user", "content": "Give me an analogy."}
]

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by providing the beginning of its answer. This is powerful for:

  • Enforcing a specific format (e.g., JSON, multiple choice)
  • Guiding the tone or direction
  • Reducing token usage by constraining output

Example: Multiple Choice

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ {"role": "user", "content": "What is the best programming language for beginners?\nA) Python\nB) Java\nC) C++\nD) Rust"}, {"role": "assistant", "content": "A"} ] )

print(message.content[0].text) # Outputs: A

Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. For these models, use structured outputs or system prompt instructions instead.

Vision: Working with Images

Claude can process images alongside text. You can supply images in three ways:

  • base64: Inline base64-encoded image data
  • url: Publicly accessible image URL
  • file: Reference to a file uploaded via the Files API
Supported media types: image/jpeg, image/png, image/gif, image/webp

Example with Base64

import anthropic
import base64

client = anthropic.Anthropic()

with open("photo.jpg", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": image_data } } ] } ] )

print(message.content[0].text)

Example with URL

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image"},
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://example.com/photo.jpg"
                    }
                }
            ]
        }
    ]
)

Best Practices

  • Manage token usage: Monitor usage.input_tokens and usage.output_tokens to control costs. Use max_tokens to limit response length.
  • Handle stop reasons: Check stop_reason in responses. "end_turn" means Claude finished naturally; "max_tokens" means the response was cut off.
  • Use streaming for long responses: For real-time applications, enable streaming to get partial results as Claude generates them.
  • Cache frequent prefixes: Use prompt caching for system prompts or long conversation histories to reduce latency and cost.
  • Validate image sizes: Large images consume more tokens. Resize or compress images before sending to optimize performance.

Key Takeaways

  • The Messages API is stateless—always send the full conversation history with each request.
  • Prefill lets you control the beginning of Claude's response, useful for formatting and guidance.
  • Claude supports vision with images in base64, URL, or file reference formats.
  • Synthetic assistant messages allow you to inject example responses or correct past interactions.
  • Always check stop_reason and usage fields to monitor response completeness and costs.