GuideBeginnerBest Practices2026-05-22

Mastering the Messages API: Building Conversational AI with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, response prefilling, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, response prefilling, and image analysis with practical Python and TypeScript examples.

Messages APIConversational AIClaude APIPrompt EngineeringMultimodal

Mastering the Messages API: Building Conversational AI with Claude

Claude's Messages API is the primary interface for integrating Claude into your applications. Whether you're building a chatbot, a content generator, or a multimodal analysis tool, understanding how to work with messages effectively is essential. This guide walks you through everything from basic requests to advanced patterns like multi-turn conversations, response prefilling, and vision capabilities.

Understanding the Messages API vs. Managed Agents

Anthropic offers two primary ways to build with Claude:

Messages API: Direct model prompting access. Best for custom agent loops and fine-grained control over every request and response.
Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.

This guide focuses on the Messages API, giving you full control over the conversation flow.

Making Your First API Request

Let's start with the simplest possible interaction: sending a single message and receiving a response.

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
    model: 'claude-opus-4-7',
    max_tokens: 1024,
    messages: [
        { role: 'user', content: 'Hello, Claude' }
    ]
});
console.log(message);

Understanding the Response

The API returns a structured response containing:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

content: An array of content blocks (text, images, tool use, etc.)
stop_reason: Indicates why the response ended ("end_turn", "max_tokens", "stop_sequence", or "tool_use")
usage: Token counts for billing and monitoring

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over context but requires you to manage conversation state on your end.

Example: Two-Turn Conversation

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(message.content[0].text)

Key Patterns for Multi-Turn Conversations

Maintain conversation history: Store all messages in a list or database, appending new user inputs and assistant responses.
Include synthetic messages: Earlier turns don't need to originate from Claude—you can inject pre-written assistant messages to guide the conversation.
Manage token limits: Longer histories consume more tokens. Use prompt caching or compaction for extended conversations.

# Example of managing conversation state
conversation_history = [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi there! How can I help?"}
]
Add new user message
conversation_history.append({"role": "user", "content": "What's the weather like?"})
Send full history
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=conversation_history
)
Append response to history
conversation_history.append({"role": "assistant", "content": response.content[0].text})

Prefilling Claude's Response

Prefilling lets you start Claude's response, guiding it toward a specific format or answer. This is powerful for:

Forcing structured outputs (e.g., JSON, multiple choice)
Setting the tone or style of the response
Reducing latency by constraining the output

Example: Multiple Choice Answer

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Outputs: "C"

Important Notes on Prefilling

Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. These models return a 400 error.
Alternative for unsupported models: Use structured outputs or system prompt instructions instead.
Use max_tokens wisely: Setting max_tokens=1 forces a single-token response, ideal for classification tasks.

Working with Images (Vision)

The Messages API supports image inputs, enabling visual analysis and multimodal interactions.

Python Example: Image Analysis

import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Formats

PNG
JPEG
WebP
GIF (static only)

Images are sent as content blocks within the message array, alongside text blocks. This allows you to combine visual and textual instructions in a single user message.

Handling Stop Reasons

Understanding why Claude stopped generating helps you build more robust applications:

Stop Reason	Meaning	Typical Action
`end_turn`	Claude finished naturally	Continue conversation or end
`max_tokens`	Output hit token limit	Increase `max_tokens` or truncate
`stop_sequence`	A custom stop sequence was hit	Handle based on sequence
`tool_use`	Claude wants to use a tool	Execute tool and continue

response = client.messages.create(...)
if response.stop_reason == "max_tokens":
    print("Response was truncated. Consider increasing max_tokens.")
elif response.stop_reason == "tool_use":
    print("Claude requested a tool call. Handle accordingly.")

Best Practices

1. Manage Token Usage Efficiently

Use prompt caching for repeated system prompts or large context
Implement conversation compaction for long histories
Monitor usage fields in responses to track costs

2. Handle Errors Gracefully

Implement retry logic with exponential backoff
Validate inputs before sending (e.g., image size, message format)
Check for model-specific limitations (e.g., prefilling support)

3. Optimize for Latency

Use streaming for real-time applications (see Streaming Messages docs)
Prefill responses when output format is predictable
Set appropriate max_tokens to avoid unnecessary generation

4. Security Considerations

The Messages API is eligible for Zero Data Retention (ZDR)—data is not stored after response is returned
Never send sensitive information in prompts unless you have appropriate agreements
Validate and sanitize user inputs before including them in messages

Key Takeaways

The Messages API is stateless—you must send the full conversation history with each request, giving you complete control over context management.
Prefilling lets you guide Claude's responses by starting its reply, but check model compatibility as some newer models don't support it.
Multi-turn conversations require you to maintain and append to a conversation history list on your end.
Vision capabilities are built-in—send images as content blocks alongside text for multimodal analysis.
Monitor stop reasons to handle truncation, tool calls, and natural conversation endings appropriately.