Guide2026-04-25

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide covers how to use Claude's Messages API to build conversational apps, including basic requests, multi-turn dialogues, prefill techniques to shape responses, and vision capabilities for image analysis.

Messages APIClaude APIMulti-turn conversationsPrefillVision

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generator, or a vision-powered application, understanding how to structure your API calls is essential. This guide walks you through everything from a simple "Hello, Claude" to advanced techniques like prefill and multi-turn conversations.

What Is the Messages API?

The Messages API gives you direct access to Claude's intelligence. You send a list of messages (the conversation history), and Claude responds with a new message. It's stateless—meaning you must send the full conversation history with each request. This design gives you complete control over context and conversation flow.

Anthropic offers two paths for building with Claude:

Messages API: Direct model access for custom agent loops and fine-grained control.
Claude Managed Agents: A pre-built, configurable agent harness for long-running tasks.

This guide focuses on the Messages API.

Basic Request and Response

Let's start with the simplest possible interaction: sending a single message and getting a response.

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Hello, Claude' }
  ]
});
console.log(message);

Understanding the Response

The API returns a structured JSON object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields:

content: An array of content blocks (usually text).
stop_reason: Why the model stopped—"end_turn" means Claude finished naturally.
usage: Token counts for billing and context management.

Building Multi-Turn Conversations

Because the Messages API is stateless, you must send the entire conversation history with each request. This makes it easy to build up a conversation over multiple turns.

Example: Two-Turn Conversation

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(message.content[0].text)

Notice that the second turn includes Claude's previous response ("Hello!") as part of the input. This maintains context.

Important Notes

Synthetic assistant messages: The assistant messages don't have to come from Claude. You can insert pre-written assistant responses to guide the conversation.
Context window: Be mindful of the total token count. Each turn adds to the context, and you may hit limits with long conversations.

Putting Words in Claude's Mouth (Prefill)

One powerful technique is prefilling—starting Claude's response for it. You include a partial assistant message at the end of your input, and Claude continues from there.

Use Case: Multiple Choice Questions

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Output: "C"

By setting max_tokens=1, you force Claude to output just a single token—the letter of the correct answer. The prefill "The answer is (" shapes the response format.

Other Prefill Applications

JSON mode: Prefill with {"response": " to force structured output.
Roleplay: Start Claude's response with a character's name or action.
Code generation: Prefill with def or function to get a function definition.

Vision Capabilities

The Messages API also supports image inputs. You can send images as base64-encoded data or as URLs.

Example: Analyze an Image

import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Types

JPEG, PNG, GIF, WebP
Maximum size: 100 MB per image
Claude can analyze images for descriptions, OCR, data extraction, and more.

Streaming Responses

For real-time applications, you can stream Claude's response token by token. This is ideal for chatbots where you want to show the response as it's being generated.

Python Streaming Example

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Tell me a short story."}
    ],
    stream=True
)
for chunk in stream:
    if chunk.type == "content_block_delta" and chunk.delta.type == "text_delta":
        print(chunk.delta.text, end="")

Streaming reduces perceived latency and improves user experience.

Best Practices

1. Manage Context Window

Each conversation turn adds tokens. For long conversations, use prompt caching or context compaction to stay within limits.

2. Use Prefill for Consistency

When you need structured output (JSON, specific formats), always use prefill to guide Claude's response.

3. Handle Stop Reasons

Check stop_reason in the response:

"end_turn": Claude finished naturally.
"max_tokens": Response was cut off—increase max_tokens or continue the conversation.
"stop_sequence": Claude hit a custom stop sequence.

4. Batch Processing

For high-volume tasks, use the Batch API to send multiple requests asynchronously.

Common Pitfalls

Forgetting history: Always send the full conversation history, or Claude will lose context.
Exceeding token limits: Monitor usage.input_tokens and usage.output_tokens to avoid surprises.
Ignoring errors: Handle API errors (rate limits, authentication) gracefully in production.

Conclusion

The Messages API is the foundation for building any application with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision, you can create powerful, interactive experiences. Start with simple calls, then layer in streaming and advanced techniques as your needs grow.

Key Takeaways

The Messages API is stateless—always send the full conversation history with each request.
Prefill allows you to shape Claude's response by starting its message for it.
Vision capabilities let you send images for analysis alongside text.
Streaming reduces latency and improves user experience for real-time apps.
Always check stop_reason and token usage to manage conversations effectively.