Guide2026-04-26

Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples in Python and TypeScript.

Quick Answer

This guide explains how to use Claude's Messages API to build conversational applications, including sending basic requests, managing multi-turn conversations, using prefill to shape responses, and leveraging vision capabilities.

Messages APIClaude APIConversational AIPrefillMulti-turn

Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude

Claude's Messages API is the primary interface for integrating Claude into your applications. Whether you're building a chatbot, a content generation tool, or an AI assistant, understanding how to work with messages is essential. This guide walks you through the core patterns—from basic requests to advanced techniques like prefilling and vision—with practical code examples.

Understanding the Messages API vs. Managed Agents

Anthropic offers two paths for building with Claude:

Messages API: Direct model access with fine-grained control over every request and response. Best for custom agent loops, tool use, and when you need to manage conversation state yourself.
Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Ideal for long-running tasks and asynchronous work.

This guide focuses on the Messages API, which gives you maximum flexibility.

Making Your First API Request

Let's start with the simplest possible interaction: sending a single message and receiving a response.

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Hello, Claude' }
  ]
});
console.log(message);

Understanding the Response

The API returns a structured JSON object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

content: An array of content blocks (text, tool_use, etc.)
stop_reason: Why the model stopped (end_turn, max_tokens, stop_sequence, or tool_use)
usage: Token counts for billing and monitoring

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage state on your end.

Conversation Flow

import anthropic
client = anthropic.Anthropic()
First turn
message1 = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
Second turn: include previous history
message2 = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(message2.content[0].text)

Important Notes

Earlier turns don't need to originate from Claude. You can inject synthetic assistant messages to guide the conversation.
Always maintain the correct alternating pattern: user → assistant → user → assistant.
The API validates message order and will reject malformed sequences.

Putting Words in Claude's Mouth: The Prefill Technique

Prefilling allows you to start Claude's response by including an assistant message with partial content at the end of your input. This is incredibly useful for:

Constraining outputs (e.g., forcing a multiple-choice answer format)
Guiding tone or style (e.g., starting with "I'd be happy to explain...")
Ensuring structured responses (e.g., JSON or XML)

Example: Multiple Choice Answer

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is Latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Output: "C"

By setting max_tokens=1 and prefilling with "The answer is (", we force Claude to complete only the letter, giving us a clean, parseable response.

Use Cases for Prefilling

Scenario	Prefill Example	Benefit
JSON output	`{"response":`	Guarantees valid JSON start
Code generation	`Here's the Python function:\n\ndef`	Forces code block format
Sentiment analysis	`Sentiment:`	Ensures consistent labeling
Translation	`French translation:`	Locks output language

Handling Streaming Responses

For real-time applications, streaming reduces perceived latency. The API supports Server-Sent Events (SSE) for streaming.

Python Streaming Example

import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a short poem about AI"}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

TypeScript Streaming Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const stream = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Write a short poem about AI' }],
  stream: true
});
for await (const event of stream) {
  if (event.type === 'content_block_delta') {
    process.stdout.write(event.delta.text);
  }
}

Working with Vision and Images

The Messages API supports image inputs. You can send images as base64-encoded data or via URLs.

Image Analysis Example

import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Best Practices for Production

Manage token budgets: Always set max_tokens to control costs and response length.
Handle stop reasons: Check stop_reason in responses. "max_tokens" means the response was cut off; you may need to continue the conversation.
Implement retry logic: Network issues happen. Use exponential backoff for transient failures.
Cache frequent prefixes: For common system prompts, use prompt caching to reduce latency and costs.
Monitor usage: Track input_tokens and output_tokens to stay within your budget.

Key Takeaways

The Messages API is stateless—you must send the full conversation history with each request, giving you complete control over context.
Prefilling allows you to shape Claude's responses by providing the beginning of its reply, enabling constrained outputs and structured formats.
Streaming responses via SSE reduce perceived latency for real-time applications.
The API supports multi-modal inputs including text and images, making it suitable for vision tasks.
Always handle stop_reason in your application logic to detect truncated responses and manage conversation flow properly.