GuideBeginnerAPI2026-05-16

Mastering the Messages API: Building Conversational AI with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with code examples in Python and TypeScript.

Messages APIClaude APIconversational AIprefillmultimodal

Introduction

Claude's Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding the Messages API is essential. This guide walks you through everything you need to know—from making your first request to handling multi-turn conversations and using advanced techniques like prefill and vision.

Anthropic offers two ways to build with Claude: the Messages API for direct model access and fine-grained control, and Claude Managed Agents for pre-built, configurable agent harnesses. This guide focuses on the Messages API, which is ideal for custom agent loops and applications requiring precise control over the conversation flow.

Basic Request and Response

Let's start with the simplest possible interaction: sending a single message to Claude and getting a response.

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
    model: 'claude-opus-4-7',
    max_tokens: 1024,
    messages: [
        { role: 'user', content: 'Hello, Claude' }
    ]
});
console.log(message);

Understanding the Response

The API returns a structured response object containing:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

content: An array of content blocks (text, images, tool use, etc.)
stop_reason: Why the response ended (end_turn, max_tokens, stop_sequence, or tool_use)
usage: Token counts for billing and monitoring

Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage state on your end.

Building a Conversation

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(message.content[0].text)

Important Patterns

Full history required: Always include all previous messages in the messages array
Synthetic assistant messages: You can insert pre-written assistant responses (e.g., for system prompts or guided conversations)
Alternating roles: Messages must alternate between user and assistant roles, starting with user

Putting Words in Claude's Mouth (Prefill)

Prefill allows you to start Claude's response for it. This is useful for:

Guiding Claude toward a specific format
Forcing multiple-choice answers
Providing a response template

Prefill Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Outputs: "C"

Prefill Limitations

Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.

For models that don't support prefill, consider:

Structured outputs: Define a JSON schema for Claude to follow
System prompt instructions: Use the system parameter to guide response format

Vision Capabilities

Claude can process images alongside text. This enables use cases like:

Image analysis and description
Document processing (receipts, forms, etc.)
Visual question answering

Vision Request Example

import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("receipt.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "What's the total amount on this receipt?"
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Formats

JPEG, PNG, GIF, WebP
Maximum size: 100MB per image
Claude automatically resizes images to fit context window limits

Handling Stop Reasons

Understanding why Claude stopped generating is crucial for building robust applications:

Stop Reason	Meaning	Action
`end_turn`	Claude finished naturally	Continue or end conversation
`max_tokens`	Hit token limit	Increase `max_tokens` or continue
`stop_sequence`	Found a stop sequence	Handle as needed
`tool_use`	Claude wants to use a tool	Execute tool and return result

Example: Handling Tool Use

if message.stop_reason == "tool_use":
    for block in message.content:
        if block.type == "tool_use":
            # Execute the tool and continue the conversation
            result = execute_tool(block.name, block.input)
            messages.append({"role": "assistant", "content": message.content})
            messages.append({"role": "user", "content": result})

Best Practices

Manage context windows: Keep conversations within Claude's context limit. Use techniques like summarization or sliding windows for long conversations.

Use system prompts: For consistent behavior, use the system parameter to set Claude's persona and constraints.

Monitor token usage: Track usage.input_tokens and usage.output_tokens to control costs and optimize prompts.

Handle errors gracefully: Implement retry logic for transient failures and validate responses before using them.

Stream responses: For better user experience, use streaming to show responses as they're generated.

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create powerful conversational AI applications. Remember that the API is stateless—you manage the conversation history—and always check stop reasons to handle different response scenarios.

Key Takeaways

The Messages API is stateless: You must send the full conversation history with every request; manage state on your end.
Prefill guides responses: Use assistant messages to start Claude's response, but check model compatibility as newer models may not support it.
Vision enables multimodal use cases: Claude can analyze images alongside text for document processing, visual QA, and more.
Stop reasons dictate next steps: Always check stop_reason to determine whether to continue the conversation, increase tokens, or handle tool calls.
Streaming improves UX: For real-time applications, use streaming to show responses as they're generated rather than waiting for the complete response.