BeClaude
GuideBeginnerAPI2026-05-23

Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use the Claude Messages API to build conversational applications, including sending basic requests, managing multi-turn conversations, using prefill to shape responses, and leveraging vision capabilities with images.

Messages APIClaudeAPI GuidePrefillVision

Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision

Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, an AI assistant, or an automated content generator, understanding how to structure requests and handle responses is essential. This guide covers the most common patterns you'll use when working with the Messages API, from simple queries to advanced techniques like prefill and vision.

Understanding the Basics

The Messages API is stateless—each request must include the full conversation history. This design gives you complete control over the context and allows for flexible conversation management. Every request requires three key components:

  • model: The Claude model you want to use (e.g., claude-opus-4-7, claude-sonnet-4-5)
  • max_tokens: The maximum number of tokens Claude can generate in the response
  • messages: An array of message objects representing the conversation history

Basic Request and Response

Here's the simplest possible request—a single user message asking for a greeting:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

The response includes the model's reply along with metadata:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

  • content: An array of content blocks (usually text, but can include tool use or thinking blocks)
  • stop_reason: Why Claude stopped generating (e.g., "end_turn", "max_tokens", "stop_sequence")
  • usage: Token counts for billing and context management

Building Multi-Turn Conversations

Since the API is stateless, you must send the entire conversation history with each request. This allows you to build up a conversation over multiple turns:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": "Hello!"}, {"role": "user", "content": "Can you describe LLMs to me?"} ] )

print(message.content[0].text)

Important: The assistant messages don't have to come from Claude—you can inject synthetic assistant messages to guide the conversation or provide context. This is useful for:
  • Setting up scenarios
  • Providing examples of desired behavior
  • Simulating previous interactions

Prefill: Putting Words in Claude's Mouth

Prefill is a powerful technique where you start Claude's response by including an assistant message with partial content at the end of your messages array. Claude will continue from where you left off.

Use Case: Multiple Choice Questions

A classic use case is getting a single-letter answer from a multiple choice question:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # Output: "C"

By setting max_tokens=1, you force Claude to output only the next token, which in this case is the letter "C". This pattern is excellent for classification tasks, quizzes, or any scenario requiring constrained output.

Prefill Limitations

Note that prefill is not supported on certain models:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
For these models, use structured outputs or system prompt instructions instead. See the migration guide for alternatives.

Vision: Working with Images

The Messages API supports image inputs, enabling Claude to analyze and describe visual content. Images are sent as base64-encoded data in the content array:

import anthropic
import base64

client = anthropic.Anthropic()

Read and encode an image file

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] )

print(message.content[0].text)

The media_type should match the image format—supported types include image/png, image/jpeg, image/gif, and image/webp. You can mix images and text in the same message, allowing for rich interactions like "What's in this photo?" or "Read the text from this document."

Handling Stop Reasons

Understanding why Claude stopped generating is crucial for building robust applications. The stop_reason field in the response tells you:

Stop ReasonMeaning
"end_turn"Claude finished naturally
"max_tokens"Response hit the token limit
"stop_sequence"Claude encountered a custom stop sequence
"tool_use"Claude wants to call a tool
For example, if you get "max_tokens", you may need to increase max_tokens or continue the conversation with a follow-up request.

Best Practices

  • Manage context windows carefully: Since you send the full history, keep track of token usage to avoid hitting limits. Use the usage field in responses to monitor consumption.
  • Use system prompts for instructions: For general behavior guidance, use the system parameter rather than injecting instructions into user messages.
  • Leverage streaming for real-time applications: The API supports streaming responses, which is ideal for chat interfaces where you want to show tokens as they're generated.
  • Handle errors gracefully: The API may return errors for invalid requests, rate limits, or server issues. Always implement retry logic with exponential backoff.

Key Takeaways

  • The Messages API is stateless—always send the full conversation history with each request
  • Prefill allows you to start Claude's response, enabling constrained outputs like multiple choice answers (but check model compatibility)
  • Vision capabilities let you send images alongside text for multimodal analysis
  • Monitor stop_reason to understand why Claude stopped and handle edge cases like hitting token limits
  • Use synthetic assistant messages to guide conversations or provide context without requiring real Claude responses