GuideBeginnerAPI2026-05-20

Mastering the Messages API: Building Conversational AI with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities. Practical guide for developers.

Quick Answer

This guide covers the core patterns for working with Claude's Messages API, including making basic requests, managing multi-turn conversations, using prefill to shape responses, and integrating vision capabilities. You'll learn how to build stateless conversational flows and control Claude's output effectively.

Messages APIConversational AIClaude APIPrefillVision

Mastering the Messages API: Building Conversational AI with Claude

Claude's Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding the Messages API is essential. This guide walks you through the most common patterns—from basic requests to advanced techniques like prefill and vision—so you can build robust, conversational AI applications.

Understanding the Messages API

The Messages API is a stateless, RESTful API that lets you send a list of messages to Claude and receive a response. Unlike stateful APIs, you must send the full conversation history with every request. This design gives you complete control over the context and enables sophisticated multi-turn interactions.

Anthropic offers two paths for building with Claude:

Messages API: Direct model access for custom agent loops and fine-grained control.
Claude Managed Agents: A pre-built, configurable agent harness for long-running, asynchronous tasks.

This guide focuses on the Messages API, which is ideal for developers who want full control over the conversation flow.

Basic Request and Response

Let's start with the simplest possible interaction: sending a single message to Claude and getting a reply.

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Hello, Claude' }
  ]
});
console.log(message);

Understanding the Response

The API returns a structured JSON object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

content: An array of content blocks. Currently, text is the primary type, but vision and tool use add more.
stop_reason: Indicates why Claude stopped. Common values: "end_turn" (natural stop), "max_tokens" (hit token limit), "stop_sequence" (matched a stop sequence), or "tool_use" (Claude wants to call a tool).
usage: Token counts for billing and context window management.

Building Multi-Turn Conversations

Since the Messages API is stateless, you must send the entire conversation history with each request. This pattern allows you to build up a conversation over time.

Python Example

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"},
    ]
)
print(message.content[0].text)

Important Notes

Synthetic assistant messages: Earlier turns don't need to come from Claude. You can inject pre-written assistant messages to guide the conversation or provide context.
History management: For long conversations, be mindful of the context window. You may need to summarize or truncate older messages.
Role alternation: The messages array must alternate between user and assistant roles. You cannot have two consecutive messages from the same role.

Putting Words in Claude's Mouth: Prefill

Prefill allows you to start Claude's response by providing the beginning of its reply. This is powerful for:

Constraining output format (e.g., JSON, multiple choice)
Guiding the tone or direction of the response
Reducing token usage by limiting the response length

Example: Multiple Choice Answer

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Output: "C"

Prefill Limitations

Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Using prefill with these models returns a 400 error. Use structured outputs or system prompt instructions instead.

Best Practices for Prefill

Keep it short: Prefill works best with a few words or characters.
Match the expected format: If you want JSON, prefill with {".
Set max_tokens appropriately: If you only need a short completion, set max_tokens to a small value to save costs.
Combine with system prompts: For complex formatting, use system prompts instead of prefill for broader model compatibility.

Vision Capabilities: Working with Images

The Messages API supports images, allowing Claude to analyze visual content. This is useful for:

Document analysis (screenshots, PDFs, forms)
Image description and captioning
Visual Q&A (e.g., "What's wrong with this UI?")

Python Example

import base64
with open("screenshot.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this image in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Formats

JPEG
PNG
GIF (static, not animated)
WebP

Images are resized and compressed by Claude to fit within the context window. For best results, use high-quality images with clear text or distinct visual elements.

Handling Stop Reasons

Understanding why Claude stopped generating is crucial for building robust applications. The stop_reason field tells you what happened:

Stop Reason	Meaning	Action to Take
`end_turn`	Claude finished naturally	Continue the conversation or return the response
`max_tokens`	Hit the token limit	Increase `max_tokens` or truncate the response
`stop_sequence`	Matched a custom stop sequence	Handle as needed (e.g., stop processing)
`tool_use`	Claude wants to call a tool	Execute the tool and continue the conversation

Example: Handling `max_tokens`

if message.stop_reason == "max_tokens":
    print("Response was truncated. Consider increasing max_tokens.")
    # Optionally, continue the conversation with a follow-up prompt

Streaming Responses

For real-time applications, you can stream Claude's response token by token. This provides a better user experience by showing progress.

Python Example

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a short poem about AI."}
    ],
    stream=True
)
for chunk in stream:
    if chunk.type == "content_block_delta":
        print(chunk.delta.text, end="", flush=True)

Streaming is especially useful for:

Chat interfaces
Long-form content generation
Real-time translation or transcription

Error Handling and Best Practices

Common Errors

400 Bad Request: Invalid parameters or unsupported model features (e.g., prefill on unsupported models).
401 Unauthorized: Invalid API key.
429 Too Many Requests: Rate limit exceeded. Implement exponential backoff.
500 Internal Server Error: Temporary server issue. Retry with backoff.

Best Practices

Always set max_tokens: Prevents runaway token usage and unexpected costs.
Validate input: Ensure messages alternate between user and assistant roles.
Handle stop_reason: Build logic around different stop reasons for robust applications.
Use streaming for UX: Stream responses for real-time feedback.
Monitor token usage: Track usage fields to manage costs and context windows.
Implement retry logic: Use exponential backoff for transient errors.

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision capabilities, you can create powerful, conversational AI applications. Remember that the API is stateless—you control the context, and with great power comes great responsibility.

Key Takeaways

The Messages API is stateless; always send the full conversation history with each request.
Use prefill to guide Claude's responses, but check model compatibility first.
Handle stop_reason to build robust applications that respond appropriately to different completion scenarios.
Streaming provides real-time token-by-token output for better user experiences.
Vision capabilities allow Claude to analyze images, expanding your application's possibilities.

Mastering the Messages API: Building Conversational AI with Claude

Understanding the Messages API

Basic Request and Response

Python Example

TypeScript Example

Understanding the Response

Building Multi-Turn Conversations

Python Example

Important Notes

Putting Words in Claude's Mouth: Prefill

Example: Multiple Choice Answer

Prefill Limitations

Best Practices for Prefill

Vision Capabilities: Working with Images

Python Example

Supported Image Formats

Handling Stop Reasons

Example: Handling max_tokens

Streaming Responses

Python Example

Error Handling and Best Practices

Common Errors

Best Practices

Conclusion

Key Takeaways

Example: Handling `max_tokens`