GuideBeginnerAPI2026-05-20

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational applications, including stateless multi-turn chats, prefill techniques to shape responses, and vision capabilities for image analysis.

Messages APIClaude APImulti-turn conversationsprefillvision

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Claude's Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or an AI-powered assistant, understanding the Messages API is essential. This guide covers everything from basic requests to advanced techniques like prefill and vision.

Understanding the Messages API vs. Claude Managed Agents

Anthropic offers two paths for building with Claude:

Messages API: Direct model access for custom agent loops and fine-grained control. You manage the conversation state and logic.
Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.

This guide focuses on the Messages API, which gives you full control over every request and response.

Making Your First API Call

Let's start with the simplest possible request: sending a single message to Claude and getting a response.

Basic Request (Python)

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

Response Structure

The API returns a structured JSON object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to understand:

content: An array of content blocks (usually text, but can include tool use blocks).
stop_reason: Why the response ended ("end_turn", "max_tokens", "stop_sequence", or "tool_use").
usage: Token counts for billing and monitoring.

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage state on your end.

Sending Conversation History

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(message.content[0].text)

Important Notes

The conversation must alternate between user and assistant roles.
You can include synthetic assistant messages—they don't need to have come from Claude. This is useful for providing examples or guiding behavior.
Always start with a user message.
The last message must be from the user role (unless you're using prefill).

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response for it. This is useful for:

Forcing a specific format (e.g., JSON, multiple choice)
Guiding the tone or structure
Reducing token usage by constraining the output

Example: Multiple Choice Answer

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Outputs: "C"

Prefill Limitations

Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6
Using prefill with these models returns a 400 error
Alternative: Use structured outputs or system prompt instructions instead

When to Use Prefill vs. System Prompts

Technique	Best For
Prefill	Short, constrained outputs (multiple choice, yes/no, single word)
System prompt	Longer instructions, tone setting, behavior guidelines
Structured outputs	JSON schemas, typed responses

Vision Capabilities: Analyzing Images

The Messages API supports image inputs, enabling Claude to analyze and describe visual content.

Sending an Image

import anthropic
import base64
client = anthropic.Anthropic()
Read and encode the image
with open("diagram.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this diagram in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Formats

JPEG, PNG, GIF, WebP
Maximum size: 100MB (though larger images are downscaled)
Optimal resolution: 1568x1568 pixels (Claude processes at this resolution)

Vision Use Cases

Document analysis: Extract text from scanned documents
UI/UX review: Analyze screenshots for design feedback
Data visualization: Interpret charts and graphs
Product photography: Generate alt text or descriptions

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications:

stop_reason	Meaning	Action
`"end_turn"`	Claude finished naturally	Continue conversation or end
`"max_tokens"`	Hit the token limit	Increase `max_tokens` or continue
`"stop_sequence"`	Hit a custom stop sequence	Handle based on your logic
`"tool_use"`	Claude wants to use a tool	Execute the tool and return results

Example: Handling max_tokens

if message.stop_reason == "max_tokens":
    # Continue the conversation to get more output
    messages.append({"role": "assistant", "content": message.content[0].text})
    messages.append({"role": "user", "content": "Please continue."})
    # Send the new request...

Best Practices for Production

1. Manage Context Window

Keep conversation history within Claude's context window (varies by model)
Use prompt caching for frequently repeated system instructions
Summarize or truncate old messages when approaching limits

2. Handle Errors Gracefully

try:
    message = client.messages.create(...)
except anthropic.APIError as e:
    print(f"API error: {e}")
    # Implement retry logic with exponential backoff
except anthropic.RateLimitError as e:
    print(f"Rate limited: {e}")
    # Wait and retry
except anthropic.APIConnectionError as e:
    print(f"Connection error: {e}")
    # Check network and retry

3. Monitor Token Usage

Track usage.input_tokens and usage.output_tokens to:

Estimate costs
Detect unexpectedly long conversations
Optimize prompts for efficiency

4. Use Streaming for Responsive UIs

For chat applications, use streaming to show Claude's response as it's generated:

with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision, you can create sophisticated applications that leverage Claude's full capabilities.

Remember these key points:

The API is stateless—you manage conversation history
Prefill is powerful but has model limitations
Vision enables image analysis workflows
Always handle stop reasons and errors in production

Key Takeaways

Stateless by design: You must send the full conversation history with every request, giving you complete control over context.
Prefill shapes responses: Use prefill to constrain outputs (e.g., multiple choice), but avoid it on newer models—use structured outputs instead.
Vision is built-in: Send images as base64-encoded content blocks for document analysis, UI review, and more.
Handle stop reasons: Different stop reasons (end_turn, max_tokens, tool_use) require different handling logic.
Stream for UX: Use streaming for real-time applications to show Claude's response as it's generated.