BeClaude
Guide2026-05-06

Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude

Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples in Python and TypeScript.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications. You'll learn basic requests, multi-turn conversations, prefill techniques to shape responses, and how to handle images with vision capabilities.

Messages APIClaude APIconversational AIprefill techniquevision

Introduction

Claude's Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent, understanding how to structure and send messages is essential. This guide covers everything from basic requests to advanced techniques like prefill and vision.

Anthropic offers two paths for building with Claude: the Messages API for direct model access and fine-grained control, and Claude Managed Agents for pre-built, configurable agent harnesses. This guide focuses on the Messages API, which is ideal for custom agent loops and applications requiring precise control over the conversation flow.

Basic Request and Response

At its core, the Messages API is straightforward. You send a list of messages, and Claude responds with a completion. Here's a minimal example using the Python SDK:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

The response includes the model's reply, metadata, and token usage:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

  • stop_reason: Indicates why the response ended ("end_turn" means Claude finished naturally).
  • usage: Tracks input and output tokens for billing and context management.

Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with each request. This design gives you complete control over context but requires careful management of message arrays.

To build a multi-turn conversation, simply append new messages to the history:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": "Hello!"}, {"role": "user", "content": "Can you describe LLMs to me?"} ] )

print(message.content[0].text)

Important Notes

  • The conversation history doesn't need to originate from Claude. You can inject synthetic assistant messages to guide behavior.
  • Always alternate between "user" and "assistant" roles. Two consecutive messages with the same role will cause an error.
  • For long conversations, consider using prompt caching to reduce costs and latency.

Prefill Technique: Putting Words in Claude's Mouth

One of the most powerful features of the Messages API is prefilling—providing the beginning of Claude's response in the last position of the input messages list. This shapes the model's output without using system prompts.

Use Case: Multiple Choice Answers

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # Output: "C"

By setting max_tokens=1 and prefilling with "The answer is (", Claude is forced to complete the response with a single character—the correct letter.

Limitations

Prefilling is not supported on the following models:
  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
Requests using prefill with these models return a 400 error. For these models, use structured outputs or system prompt instructions instead.

Alternative: Structured Outputs

For models that don't support prefill, you can use structured outputs to enforce response formats:
import anthropic
from pydantic import BaseModel

class QuizAnswer(BaseModel): answer: str explanation: str

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "What is latin for Ant? Options: A) Apoidea, B) Rhopalocera, C) Formicidae"} ], tools=[ { "name": "quiz_answer", "description": "Provide the answer and explanation", "input_schema": QuizAnswer.model_json_schema() } ], tool_choice={"type": "tool", "name": "quiz_answer"} )

print(message.content[0].input) # Output: {"answer": "C", "explanation": "..."}

Vision: Working with Images

Claude can analyze images sent via the Messages API. This is useful for applications like document analysis, visual Q&A, and content moderation.

Sending an Image

import anthropic
import base64

client = anthropic.Anthropic()

Read and encode the image

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] )

print(message.content[0].text)

Supported Image Formats

  • JPEG
  • PNG
  • GIF
  • WebP

Best Practices for Vision

  • Use high-resolution images when details matter (Claude supports up to 8K resolution).
  • Combine images with text for best results. A descriptive prompt helps Claude focus on relevant details.
  • Be mindful of token costs: Images consume tokens based on their resolution. Larger images cost more.

Handling Stop Reasons

Understanding why Claude stopped generating is crucial for building robust applications. The stop_reason field in the response can have these values:

stop_reasonMeaning
"end_turn"Claude finished naturally
"max_tokens"Response reached the token limit
"stop_sequence"A custom stop sequence was encountered
"tool_use"Claude wants to call a tool

Example: Handling max_tokens

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=50,
    messages=[
        {"role": "user", "content": "Write a 1000-word essay on AI safety."}
    ]
)

if message.stop_reason == "max_tokens": print("Response was truncated. Consider increasing max_tokens or continuing the conversation.")

Streaming Responses

For real-time applications, you can stream responses token by token:

import anthropic

client = anthropic.Anthropic()

with client.messages.stream( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Tell me a story about a brave robot."} ] ) as stream: for text in stream.text_stream: print(text, end="", flush=True)

Streaming is ideal for chat interfaces where you want to show responses as they're generated.

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create powerful AI applications tailored to your specific needs.

Remember these key points:

  • Always send the full conversation history (stateless design)
  • Use prefill to shape responses, but check model compatibility
  • Handle stop reasons to manage conversation flow
  • Stream responses for better user experience
  • Leverage vision for multimodal applications

Key Takeaways

  • The Messages API is stateless: You must send the full conversation history with each request, giving you complete control over context.
  • Prefill shapes responses: By providing the beginning of Claude's response, you can enforce formats like multiple-choice answers, but this technique isn't supported on all models.
  • Vision enables multimodal AI: Claude can analyze images (JPEG, PNG, GIF, WebP) sent via the API, opening up document analysis and visual Q&A use cases.
  • Streaming improves UX: Use the streaming API for real-time token-by-token responses in chat interfaces.
  • Handle stop reasons: Always check stop_reason to determine if a response was truncated or if Claude needs to use a tool.