BeClaude
GuideBeginnerAPI2026-05-15

Mastering the Messages API: A Practical Guide to Building Conversations with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques to shape responses, and vision capabilities for image analysis—all with practical Python and TypeScript code examples.

Messages APIConversation DesignClaude APIPrefillVision

Mastering the Messages API: A Practical Guide to Building Conversations with Claude

Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding how to structure messages is essential. This guide walks you through everything from basic requests to advanced techniques like prefill and vision.

Understanding the Messages API vs. Claude Managed Agents

Before diving in, it's important to know that Anthropic offers two approaches for building with Claude:

  • Messages API: Direct model prompting access—you control every aspect of the conversation loop. Best for custom agent loops and fine-grained control.
  • Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
This guide focuses on the Messages API, which gives you maximum flexibility.

Making Your First API Request

The simplest interaction with Claude involves sending a single user message and receiving a response. Here's how it looks in Python:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

The response includes several important fields:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key Response Fields

  • id: Unique identifier for the message
  • content: Array of content blocks (text, tool_use, etc.)
  • stop_reason: Why Claude stopped generating (end_turn, max_tokens, stop_sequence, or tool_use)
  • usage: Token counts for billing and monitoring

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires careful management.

Basic Multi-Turn Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": "Hello!"}, {"role": "user", "content": "Can you describe LLMs to me?"} ] )

print(message.content[0].text)

Managing Conversation State

Since you control the history, you can:

  • Trim old messages to stay within context windows
  • Inject system prompts at the beginning
  • Add synthetic assistant messages to guide behavior
  • Persist conversations across sessions by storing message arrays
Pro Tip: Earlier turns don't need to originate from Claude. You can inject synthetic assistant messages to provide context or correct behavior without waiting for the API.

Putting Words in Claude's Mouth: The Prefill Technique

Prefilling allows you to start Claude's response for it. This is incredibly useful for:

  • Constraining output format (e.g., getting a single letter answer)
  • Guiding response structure (e.g., starting with "The answer is")
  • Reducing token usage for predictable responses

Example: Multiple Choice Answer

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # Outputs: "C"

Important Prefill Limitations

  • Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6
  • These models return a 400 error for prefill requests
  • Alternative: Use structured outputs or system prompt instructions instead
  • Check the migration guide for patterns

When to Use Prefill vs. System Prompts

TechniqueBest For
PrefillShort, constrained outputs (single tokens, specific formats)
System PromptGeneral behavior guidance, long instructions
Structured OutputsJSON schemas, complex structured data

Working with Vision Capabilities

Claude can analyze images sent through the Messages API. This opens up use cases like:

  • Document analysis
  • Image description
  • UI/UX review
  • Visual data extraction

Sending an Image

import anthropic
import base64

client = anthropic.Anthropic()

Read and encode image

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail" } ] } ] )

print(message.content[0].text)

Supported Image Formats

  • JPEG, PNG, GIF, WebP
  • Maximum size: ~100MB (base64 encoded)
  • Best results with clear, high-contrast images

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications:

Stop ReasonMeaningAction
end_turnClaude finished naturallyContinue or end conversation
max_tokensHit token limitIncrease max_tokens or continue
stop_sequenceFound a stop sequenceProcess the response
tool_useClaude wants to use a toolExecute tool and return result
if message.stop_reason == "max_tokens":
    # Continue the conversation to get more output
    messages.append({"role": "assistant", "content": message.content[0].text})
    messages.append({"role": "user", "content": "Please continue"})
elif message.stop_reason == "tool_use":
    # Handle tool calls
    pass

Best Practices for Production

1. Manage Context Windows

Claude's context window is large but finite. Implement strategies like:

  • Sliding window (keep last N messages)
  • Summarization of old conversations
  • Prompt caching for frequently used context

2. Handle Errors Gracefully

try:
    message = client.messages.create(...)
except anthropic.APIError as e:
    print(f"API error: {e}")
    # Implement retry logic
except anthropic.APIConnectionError as e:
    print(f"Connection error: {e}")
    # Queue for retry
except anthropic.RateLimitError as e:
    print(f"Rate limited: {e}")
    # Implement exponential backoff

3. Monitor Token Usage

Track usage.input_tokens and usage.output_tokens for cost management and optimization.

4. Use Streaming for Real-Time UX

For chat applications, enable streaming to show responses as they're generated:

with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Key Takeaways

  • The Messages API is stateless—you must send full conversation history with every request, giving you complete control over context management
  • Prefill technique lets you start Claude's response, enabling constrained outputs and format control (but check model compatibility)
  • Multi-turn conversations require careful history management; you can inject synthetic assistant messages to guide behavior
  • Vision capabilities allow image analysis by sending base64-encoded images alongside text prompts
  • Always handle stop reasons (end_turn, max_tokens, tool_use) to build robust, production-ready applications
For complete API specifications, refer to the Messages API reference.