BeClaude
GuideBeginnerAPI2026-05-20

Mastering the Messages API: Building Conversational AI with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities. Practical guide for developers.

Quick Answer

This guide covers the core patterns for working with Claude's Messages API, including making basic requests, managing multi-turn conversations, using prefill to shape responses, and integrating vision capabilities. You'll learn how to build stateless conversational flows and control Claude's output effectively.

Messages APIConversational AIClaude APIPrefillVision

Mastering the Messages API: Building Conversational AI with Claude

Claude's Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding the Messages API is essential. This guide walks you through the most common patterns—from basic requests to advanced techniques like prefill and vision—so you can build robust, conversational AI applications.

Understanding the Messages API

The Messages API is a stateless, RESTful API that lets you send a list of messages to Claude and receive a response. Unlike stateful APIs, you must send the full conversation history with every request. This design gives you complete control over the context and enables sophisticated multi-turn interactions.

Anthropic offers two paths for building with Claude:

  • Messages API: Direct model access for custom agent loops and fine-grained control.
  • Claude Managed Agents: A pre-built, configurable agent harness for long-running, asynchronous tasks.
This guide focuses on the Messages API, which is ideal for developers who want full control over the conversation flow.

Basic Request and Response

Let's start with the simplest possible interaction: sending a single message to Claude and getting a reply.

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const message = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 1024, messages: [ { role: 'user', content: 'Hello, Claude' } ] });

console.log(message);

Understanding the Response

The API returns a structured JSON object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

  • content: An array of content blocks. Currently, text is the primary type, but vision and tool use add more.
  • stop_reason: Indicates why Claude stopped. Common values: "end_turn" (natural stop), "max_tokens" (hit token limit), "stop_sequence" (matched a stop sequence), or "tool_use" (Claude wants to call a tool).
  • usage: Token counts for billing and context window management.

Building Multi-Turn Conversations

Since the Messages API is stateless, you must send the entire conversation history with each request. This pattern allows you to build up a conversation over time.

Python Example

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"},
    ]
)

print(message.content[0].text)

Important Notes

  • Synthetic assistant messages: Earlier turns don't need to come from Claude. You can inject pre-written assistant messages to guide the conversation or provide context.
  • History management: For long conversations, be mindful of the context window. You may need to summarize or truncate older messages.
  • Role alternation: The messages array must alternate between user and assistant roles. You cannot have two consecutive messages from the same role.

Putting Words in Claude's Mouth: Prefill

Prefill allows you to start Claude's response by providing the beginning of its reply. This is powerful for:

  • Constraining output format (e.g., JSON, multiple choice)
  • Guiding the tone or direction of the response
  • Reducing token usage by limiting the response length

Example: Multiple Choice Answer

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

print(message.content[0].text) # Output: "C"

Prefill Limitations

Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Using prefill with these models returns a 400 error. Use structured outputs or system prompt instructions instead.

Best Practices for Prefill

  • Keep it short: Prefill works best with a few words or characters.
  • Match the expected format: If you want JSON, prefill with {".
  • Set max_tokens appropriately: If you only need a short completion, set max_tokens to a small value to save costs.
  • Combine with system prompts: For complex formatting, use system prompts instead of prefill for broader model compatibility.

Vision Capabilities: Working with Images

The Messages API supports images, allowing Claude to analyze visual content. This is useful for:

  • Document analysis (screenshots, PDFs, forms)
  • Image description and captioning
  • Visual Q&A (e.g., "What's wrong with this UI?")

Python Example

import base64

with open("screenshot.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this image in detail." } ] } ] )

print(message.content[0].text)

Supported Image Formats

  • JPEG
  • PNG
  • GIF (static, not animated)
  • WebP
Images are resized and compressed by Claude to fit within the context window. For best results, use high-quality images with clear text or distinct visual elements.

Handling Stop Reasons

Understanding why Claude stopped generating is crucial for building robust applications. The stop_reason field tells you what happened:

Stop ReasonMeaningAction to Take
end_turnClaude finished naturallyContinue the conversation or return the response
max_tokensHit the token limitIncrease max_tokens or truncate the response
stop_sequenceMatched a custom stop sequenceHandle as needed (e.g., stop processing)
tool_useClaude wants to call a toolExecute the tool and continue the conversation

Example: Handling max_tokens

if message.stop_reason == "max_tokens":
    print("Response was truncated. Consider increasing max_tokens.")
    # Optionally, continue the conversation with a follow-up prompt

Streaming Responses

For real-time applications, you can stream Claude's response token by token. This provides a better user experience by showing progress.

Python Example

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a short poem about AI."}
    ],
    stream=True
)

for chunk in stream: if chunk.type == "content_block_delta": print(chunk.delta.text, end="", flush=True)

Streaming is especially useful for:

  • Chat interfaces
  • Long-form content generation
  • Real-time translation or transcription

Error Handling and Best Practices

Common Errors

  • 400 Bad Request: Invalid parameters or unsupported model features (e.g., prefill on unsupported models).
  • 401 Unauthorized: Invalid API key.
  • 429 Too Many Requests: Rate limit exceeded. Implement exponential backoff.
  • 500 Internal Server Error: Temporary server issue. Retry with backoff.

Best Practices

  • Always set max_tokens: Prevents runaway token usage and unexpected costs.
  • Validate input: Ensure messages alternate between user and assistant roles.
  • Handle stop_reason: Build logic around different stop reasons for robust applications.
  • Use streaming for UX: Stream responses for real-time feedback.
  • Monitor token usage: Track usage fields to manage costs and context windows.
  • Implement retry logic: Use exponential backoff for transient errors.

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision capabilities, you can create powerful, conversational AI applications. Remember that the API is stateless—you control the context, and with great power comes great responsibility.

Key Takeaways

  • The Messages API is stateless; always send the full conversation history with each request.
  • Use prefill to guide Claude's responses, but check model compatibility first.
  • Handle stop_reason to build robust applications that respond appropriately to different completion scenarios.
  • Streaming provides real-time token-by-token output for better user experiences.
  • Vision capabilities allow Claude to analyze images, expanding your application's possibilities.