GuideBeginnerAPI2026-05-22

Mastering the Messages API: Build Conversational AI with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with Python and TypeScript examples.

Messages APIClaude APIconversational AIprefillvision

Introduction

The Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generator, or a complex agent system, understanding how to work with messages is essential. This guide covers everything from basic requests to advanced techniques like prefill and vision, with practical code examples you can use immediately.

Understanding the Messages API vs. Claude Managed Agents

Anthropic offers two approaches for building with Claude:

Messages API: Direct model prompting access. Best for custom agent loops and fine-grained control.
Claude Managed Agents: Pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.

This guide focuses on the Messages API, which gives you maximum flexibility and control.

Making Your First API Request

Let's start with the simplest possible request: sending a single message to Claude and getting a response.

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
    model: 'claude-opus-4-7',
    max_tokens: 1024,
    messages: [
        { role: 'user', content: 'Hello, Claude' }
    ]
});
console.log(message);

Understanding the Response

The API returns a structured response:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

content: An array of content blocks (text, images, etc.)
stop_reason: Why the model stopped generating (e.g., "end_turn", "max_tokens")
usage: Token counts for billing and optimization

Building Multi-Turn Conversations

The Messages API is stateless — you must send the full conversation history with each request. This gives you complete control over context.

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(message.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
    model: 'claude-opus-4-7',
    max_tokens: 1024,
    messages: [
        { role: 'user', content: 'Hello, Claude' },
        { role: 'assistant', content: 'Hello!' },
        { role: 'user', content: 'Can you describe LLMs to me?' }
    ]
});
console.log(message.content[0].text);

Important Notes

Earlier turns don't need to originate from Claude — you can use synthetic assistant messages for context
The conversation history grows with each turn, so manage token limits carefully
Use prompt caching for long conversations to reduce costs

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response, guiding the model toward a specific output format or direction. This is powerful for structured outputs.

Example: Multiple Choice Answer

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Output: "C"

When to Use Prefill

Structured outputs: Force JSON or specific formats
Multiple choice: Get concise answers
Chain-of-thought: Start reasoning patterns
Format control: Ensure consistent response structure

Limitations

Note: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.

Working with Vision Capabilities

The Messages API supports image inputs, enabling visual understanding.

Python Example

import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "What does this chart show?"
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Formats

PNG
JPEG
WEBP
GIF (first frame only)

Handling Stop Reasons

Understanding why Claude stopped generating helps you handle different scenarios:

Stop Reason	Meaning	Action
`end_turn`	Claude finished naturally	Continue conversation
`max_tokens`	Token limit reached	Increase max_tokens or split response
`stop_sequence`	Custom stop sequence triggered	Handle as designed
`tool_use`	Claude wants to use a tool	Execute tool and continue

Best Practices

1. Manage Token Usage

Monitor usage.input_tokens and usage.output_tokens in responses
Use prompt caching for repeated context
Set appropriate max_tokens to control costs

2. Handle Errors Gracefully

try:
    message = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
except anthropic.APIError as e:
    print(f"API Error: {e}")
    # Implement retry logic

3. Use System Prompts for Instructions

For models that don't support prefill, use system prompts:

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system="You are a helpful assistant that always responds in JSON format.",
    messages=[
        {"role": "user", "content": "List three programming languages"}
    ]
)

4. Streaming for Better UX

For real-time applications, use streaming:

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)
for chunk in stream:
    if chunk.type == "content_block_delta":
        print(chunk.delta.text, end="")

Conclusion

The Messages API is your gateway to building powerful conversational AI applications with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create sophisticated interactions tailored to your specific use case.

Remember that the API is stateless — you control the conversation history. Use this to your advantage by carefully managing context, leveraging prefill for structured outputs, and handling stop reasons appropriately.

Key Takeaways

The Messages API is stateless — always send the full conversation history with each request
Prefill lets you guide Claude's response by starting it yourself, but check model compatibility
Vision capabilities allow Claude to analyze images alongside text in a single request
Monitor stop_reason to handle different completion scenarios (end_turn, max_tokens, tool_use)
Use streaming for real-time applications and prompt caching for long conversations to optimize performance and costs