Guide2026-05-06

Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities. Includes code examples in Python and TypeScript.

Quick Answer

This guide teaches you how to use Claude's Messages API to send prompts, manage multi-turn conversations, prefill responses, and handle images. You'll get practical code examples and best practices for building robust AI applications.

Messages APIClaude APIConversational AIPrefillVision

Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude

Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a document analyzer, or a creative writing assistant, understanding how to structure requests and handle responses is essential. This guide walks you through the core patterns—from a simple hello to multi-turn conversations, prefill techniques, and vision capabilities.

Understanding the Basics

At its heart, the Messages API is a stateless REST endpoint. You send a list of messages (the conversation history), and Claude returns a new message. Each request is independent, meaning you must include the full context every time.

Basic Request and Response

Here's the simplest possible call—a single user message:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

Response:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

role: Always "assistant" in the response.
content: An array of content blocks (text, tool_use, etc.).
stop_reason: Why the model stopped—"end_turn" means it finished naturally.
usage: Token counts for billing and context management.

Building Multi-Turn Conversations

Since the API is stateless, you must send the entire conversation history with each request. This gives you full control over context.

Example: A Two-Turn Conversation

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)

Important: The assistant's previous response ("Hello!") is included verbatim. You can even inject synthetic assistant messages—they don't have to come from Claude. This is useful for:

Prompting with examples (few-shot learning)
Guiding the conversation with pre-written assistant turns
Simulating role-play scenarios

Managing Conversation State

In a real application, you'll store the message history in a database or session. Each time the user sends a new message, you append it to the history and send the entire array. Claude's response is then appended for the next turn.

conversation = [
    {"role": "user", "content": "What's the capital of France?"}
]
First turn
response = client.messages.create(model="claude-opus-4-7", max_tokens=1024, messages=conversation)
conversation.append({"role": "assistant", "content": response.content[0].text})
Second turn
conversation.append({"role": "user", "content": "And what is its population?"})
response = client.messages.create(model="claude-opus-4-7", max_tokens=1024, messages=conversation)

Prefilling Claude's Response

One of the most powerful techniques is prefilling—you start Claude's response by including an assistant message with partial content. This shapes the model's output.

Use Case: Forcing a Multiple Choice Answer

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

Response:

{
  "content": [{"type": "text", "text": "C"}],
  "stop_reason": "max_tokens"
}

By setting max_tokens=1 and prefilling "The answer is (", Claude only needs to output the letter. This is perfect for classification tasks, quizzes, or structured outputs.

When Prefill Is Not Supported

Prefilling is not supported on:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6

Requests with prefill on these models will return a 400 error. Use structured outputs or system prompt instructions instead.

Other Prefill Patterns

JSON completion: Prefill with {"response": to get valid JSON.
Code generation: Prefill with def calculate_total(): to start a function.
Creative writing: Prefill with "The story begins" to set the tone.

Vision: Sending Images to Claude

Claude can analyze images sent via the Messages API. This unlocks use cases like document scanning, image description, and visual Q&A.

Sending a Base64-Encoded Image

import base64
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)

Key points:

The content field is an array of content blocks.
Supported media types: image/png, image/jpeg, image/gif, image/webp.
Image size is limited (check the latest docs for limits).
Claude can extract text, analyze trends, and describe visual elements.

Vision Use Cases

Document analysis: Extract data from scanned PDFs or screenshots.
UI testing: Describe what a webpage looks like.
Medical imaging: Identify features in X-rays or diagrams.
E-commerce: Generate product descriptions from photos.

Handling Stop Reasons

The stop_reason field tells you why Claude stopped generating. Understanding this helps you handle edge cases:

stop_reason	Meaning	Action
`"end_turn"`	Claude finished naturally	Continue conversation
`"max_tokens"`	Output hit the token limit	Increase `max_tokens` or truncate
`"stop_sequence"`	A custom stop sequence was hit	Handle as needed
`"tool_use"`	Claude wants to call a tool	Execute the tool and return result

Best Practices

Always include max_tokens: Prevents runaway responses and unexpected costs.
Use system parameter for instructions: Instead of putting instructions in the user message, use the dedicated system field for better performance.
Monitor token usage: The usage field helps you track costs and optimize context length.
Handle errors gracefully: Network issues, rate limits, and invalid requests should be caught and retried.
Cache frequent prompts: Use prompt caching for repeated system instructions to reduce latency and cost.

Conclusion

The Messages API is the foundation of any Claude-powered application. By mastering basic requests, multi-turn conversations, prefill, and vision, you can build sophisticated AI experiences. Remember that the API is stateless—you control the context. Use prefill to guide responses, and leverage vision to unlock multimodal capabilities.

Key Takeaways

Stateless design: You must send the full conversation history with every request; store and append messages manually.
Prefill shapes output: Starting Claude's response with partial text forces structured answers—great for classification and JSON generation.
Vision is powerful: Send images as base64 content blocks for document analysis, UI testing, and more.
Watch stop reasons: end_turn means natural completion, max_tokens means you need more capacity, and tool_use triggers function calling.
Check model compatibility: Prefill is not supported on Opus 4.7, Sonnet 4.6, and Mythos Preview—use structured outputs instead.