GuideBeginnerBest Practices2026-05-15

Mastering the Messages API: Building Conversational AI with Claude

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities. Practical guide with code examples.

Quick Answer

This guide teaches you how to use the Claude Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Messages APIconversational AIClaudeAPI integrationprompt engineering

Introduction

The Claude Messages API is the primary interface for building conversational AI applications with Anthropic's Claude models. Whether you're creating a chatbot, a content generation tool, or a complex agent system, understanding how to work with messages is essential.

This guide covers the core patterns you'll use daily: making basic requests, managing multi-turn conversations, using prefill to shape responses, and leveraging vision capabilities. By the end, you'll have a solid foundation for building production-ready applications with Claude.

Basic Request and Response

At its simplest, the Messages API takes a list of messages and returns Claude's response. Each message has a role (either "user" or "assistant") and content.

Here's a minimal example in Python:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message.content[0].text)

The response includes:

id: Unique identifier for the message
role: Always "assistant" for responses
content: Array of content blocks (usually text)
model: The model used
stop_reason: Why generation stopped ("end_turn", "max_tokens", "stop_sequence", or "tool_use")
usage: Token counts for input and output

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "usage": {"input_tokens": 12, "output_tokens": 6}
}

Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over context but requires you to manage state on your side.

To continue a conversation, append new messages to the history:

import anthropic
client = anthropic.Anthropic()
First turn
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
Second turn: include previous exchange
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": response.content[0].text},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(response.content[0].text)

Synthetic Assistant Messages

You can inject synthetic assistant messages—messages that didn't actually come from Claude. This is useful for:

Few-shot prompting: Showing examples of desired responses
Guiding behavior: Demonstrating tone or format
Correcting context: Providing "correct" answers in history

messages = [
    {"role": "user", "content": "What's the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What's the capital of Italy?"}
]

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by including an assistant message with partial content at the end of your input. This is powerful for:

Constraining output format (e.g., JSON, multiple choice)
Guiding the start of a response
Reducing token usage for structured outputs

Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Use structured outputs or system prompt instructions instead.

Example: Multiple Choice Answer

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Outputs: "C"

By setting max_tokens=1 and pre-filling "The answer is (", Claude only needs to output the single letter "C". This is efficient and predictable.

Example: JSON Output

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=200,
    messages=[
        {
            "role": "user",
            "content": "Extract the name and age from: John is 30 years old."
        },
        {
            "role": "assistant",
            "content": "Here's the JSON:\n{"
        }
    ]
)

Vision Capabilities

The Messages API supports images as input. You can send base64-encoded images or image URLs, and Claude can analyze them.

import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications:

stop_reason	Meaning	Action
`"end_turn"`	Claude finished naturally	Return response to user
`"max_tokens"`	Hit token limit	Increase `max_tokens` or continue
`"stop_sequence"`	Hit a custom stop sequence	Handle as needed
`"tool_use"`	Claude wants to use a tool	Execute tool and continue

response = client.messages.create(...)
if response.stop_reason == "max_tokens":
    print("Response was truncated. Consider increasing max_tokens.")
elif response.stop_reason == "tool_use":
    print("Claude requested a tool call.")

Best Practices

Manage context window: Keep conversation history within the model's context window. Use techniques like summarization or sliding windows for long conversations.

Use system prompts: For persistent instructions, use the system parameter rather than repeating instructions in every user message.

Handle errors gracefully: The API may return errors for invalid requests, rate limits, or server issues. Implement retry logic with exponential backoff.

Monitor token usage: Track usage.input_tokens and usage.output_tokens to control costs and optimize prompts.

Stream responses: For better user experience, use streaming to show responses as they're generated.

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create sophisticated conversational AI applications. Remember that the API is stateless—you manage the conversation history—and that prefill gives you fine-grained control over Claude's output.

Key Takeaways

The Messages API is stateless; you must send the full conversation history with each request
Prefill allows you to start Claude's response, useful for constraining output format or guiding behavior
Vision capabilities let you send images for Claude to analyze alongside text
Always check stop_reason to understand why generation ended and handle appropriately
Monitor token usage to control costs and optimize your prompts for efficiency