BeClaude
GuideBeginnerAPI2026-05-20

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational applications, including stateless multi-turn chats, prefill techniques to shape responses, and vision capabilities for image analysis.

Messages APIClaude APImulti-turn conversationsprefillvision

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Claude's Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or an AI-powered assistant, understanding the Messages API is essential. This guide covers everything from basic requests to advanced techniques like prefill and vision.

Understanding the Messages API vs. Claude Managed Agents

Anthropic offers two paths for building with Claude:

  • Messages API: Direct model access for custom agent loops and fine-grained control. You manage the conversation state and logic.
  • Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
This guide focuses on the Messages API, which gives you full control over every request and response.

Making Your First API Call

Let's start with the simplest possible request: sending a single message to Claude and getting a response.

Basic Request (Python)

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

Response Structure

The API returns a structured JSON object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to understand:

  • content: An array of content blocks (usually text, but can include tool use blocks).
  • stop_reason: Why the response ended ("end_turn", "max_tokens", "stop_sequence", or "tool_use").
  • usage: Token counts for billing and monitoring.

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage state on your end.

Sending Conversation History

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": "Hello!"}, {"role": "user", "content": "Can you describe LLMs to me?"} ] )

print(message.content[0].text)

Important Notes

  • The conversation must alternate between user and assistant roles.
  • You can include synthetic assistant messages—they don't need to have come from Claude. This is useful for providing examples or guiding behavior.
  • Always start with a user message.
  • The last message must be from the user role (unless you're using prefill).

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response for it. This is useful for:

  • Forcing a specific format (e.g., JSON, multiple choice)
  • Guiding the tone or structure
  • Reducing token usage by constraining the output

Example: Multiple Choice Answer

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # Outputs: "C"

Prefill Limitations

  • Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6
  • Using prefill with these models returns a 400 error
  • Alternative: Use structured outputs or system prompt instructions instead

When to Use Prefill vs. System Prompts

TechniqueBest For
PrefillShort, constrained outputs (multiple choice, yes/no, single word)
System promptLonger instructions, tone setting, behavior guidelines
Structured outputsJSON schemas, typed responses

Vision Capabilities: Analyzing Images

The Messages API supports image inputs, enabling Claude to analyze and describe visual content.

Sending an Image

import anthropic
import base64

client = anthropic.Anthropic()

Read and encode the image

with open("diagram.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this diagram in detail." } ] } ] )

print(message.content[0].text)

Supported Image Formats

  • JPEG, PNG, GIF, WebP
  • Maximum size: 100MB (though larger images are downscaled)
  • Optimal resolution: 1568x1568 pixels (Claude processes at this resolution)

Vision Use Cases

  • Document analysis: Extract text from scanned documents
  • UI/UX review: Analyze screenshots for design feedback
  • Data visualization: Interpret charts and graphs
  • Product photography: Generate alt text or descriptions

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications:

stop_reasonMeaningAction
"end_turn"Claude finished naturallyContinue conversation or end
"max_tokens"Hit the token limitIncrease max_tokens or continue
"stop_sequence"Hit a custom stop sequenceHandle based on your logic
"tool_use"Claude wants to use a toolExecute the tool and return results

Example: Handling max_tokens

if message.stop_reason == "max_tokens":
    # Continue the conversation to get more output
    messages.append({"role": "assistant", "content": message.content[0].text})
    messages.append({"role": "user", "content": "Please continue."})
    # Send the new request...

Best Practices for Production

1. Manage Context Window

  • Keep conversation history within Claude's context window (varies by model)
  • Use prompt caching for frequently repeated system instructions
  • Summarize or truncate old messages when approaching limits

2. Handle Errors Gracefully

try:
    message = client.messages.create(...)
except anthropic.APIError as e:
    print(f"API error: {e}")
    # Implement retry logic with exponential backoff
except anthropic.RateLimitError as e:
    print(f"Rate limited: {e}")
    # Wait and retry
except anthropic.APIConnectionError as e:
    print(f"Connection error: {e}")
    # Check network and retry

3. Monitor Token Usage

Track usage.input_tokens and usage.output_tokens to:

  • Estimate costs
  • Detect unexpectedly long conversations
  • Optimize prompts for efficiency

4. Use Streaming for Responsive UIs

For chat applications, use streaming to show Claude's response as it's generated:

with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision, you can create sophisticated applications that leverage Claude's full capabilities.

Remember these key points:

  • The API is stateless—you manage conversation history
  • Prefill is powerful but has model limitations
  • Vision enables image analysis workflows
  • Always handle stop reasons and errors in production

Key Takeaways

  • Stateless by design: You must send the full conversation history with every request, giving you complete control over context.
  • Prefill shapes responses: Use prefill to constrain outputs (e.g., multiple choice), but avoid it on newer models—use structured outputs instead.
  • Vision is built-in: Send images as base64-encoded content blocks for document analysis, UI review, and more.
  • Handle stop reasons: Different stop reasons (end_turn, max_tokens, tool_use) require different handling logic.
  • Stream for UX: Use streaming for real-time applications to show Claude's response as it's generated.