GuideBeginnerAPI2026-05-22

Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision

Learn how to use the Claude Messages API for single and multi-turn conversations, prefill techniques, vision capabilities, and streaming. Includes Python and TypeScript code examples.

Quick Answer

This guide teaches you how to send requests, manage multi-turn conversations, prefill Claude's responses, use vision with images, and stream outputs using the Messages API.

Messages APIClaudeConversational AIVisionStreaming

Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision

The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, an agent, or a content generation tool, understanding how to structure requests and handle responses is essential.

This guide walks you through the most common patterns: basic requests, multi-turn conversations, prefill techniques, vision capabilities, and streaming. By the end, you'll be able to build robust, production-ready applications with Claude.

Understanding the Messages API vs. Managed Agents

Anthropic offers two paths for building with Claude:

Messages API: Direct access to the model. You control the conversation loop, manage state, and handle tool calls. Best for custom agent loops and fine-grained control.
Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.

This guide focuses on the Messages API, which gives you maximum flexibility.

Making Your First API Request

A basic request to the Messages API requires three things:

model: The Claude model you want to use (e.g., claude-opus-4-7)
max_tokens: The maximum number of tokens in the response
messages: An array of message objects, each with a role and content

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Hello, Claude' }
  ]
});
console.log(message);

Understanding the Response

The API returns a structured JSON object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    { "type": "text", "text": "Hello!" }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields:

content: An array of content blocks (text, tool_use, etc.)
stop_reason: Why the model stopped (end_turn, max_tokens, stop_sequence, tool_use)
usage: Token counts for billing and monitoring

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context.

Example: Two-Turn Conversation

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)

Important: The assistant messages in the history don't have to come from Claude. You can inject synthetic assistant messages to guide the conversation or provide context.

Best Practices for Conversation History

Keep the full history for coherent multi-turn interactions
Truncate or summarize older turns to stay within context limits
Use system prompts for persistent instructions
Consider prompt caching for long conversations

Prefilling Claude's Response

Prefilling lets you start Claude's response for it. This is useful for:

Forcing structured output formats
Guiding the model toward a specific answer
Reducing latency by constraining the first tokens

Example: Multiple Choice Answer

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is Latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Outputs: "C"

Prefill Limitations

Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6
Using prefill with these models returns a 400 error
Alternative: Use structured outputs or system prompt instructions

Vision: Working with Images

Claude can process images sent via the Messages API. Images can be provided as base64-encoded data or as URLs.

Example: Image Analysis

import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Formats

JPEG, PNG, GIF, WebP
Maximum size: 100 MB per image
Claude automatically resizes large images

Streaming Responses

For real-time applications, streaming reduces perceived latency. The API supports streaming via Server-Sent Events (SSE).

Python Streaming Example

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a short poem about AI."}
    ],
    stream=True
)
for event in stream:
    if event.type == "content_block_delta":
        print(event.delta.text, end="", flush=True)

TypeScript Streaming Example

const stream = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Write a short poem about AI.' }
  ],
  stream: true
});
for await (const event of stream) {
  if (event.type === 'content_block_delta') {
    process.stdout.write(event.delta.text);
  }
}

Handling Stop Reasons

Claude can stop generating for several reasons. Your code should handle each case:

`stop_reason`	Meaning	Action
`end_turn`	Claude finished naturally	Return response
`max_tokens`	Response was cut off	Continue with more tokens or truncate
`stop_sequence`	A custom stop sequence was hit	Handle as needed
`tool_use`	Claude wants to call a tool	Execute tool and continue

Error Handling Best Practices

Always wrap API calls in try-except blocks:

try:
    message = client.messages.create(...)
except anthropic.APIError as e:
    print(f"API error: {e}")
except anthropic.RateLimitError as e:
    print(f"Rate limited: {e}")
    # Implement exponential backoff
except anthropic.APIConnectionError as e:
    print(f"Connection error: {e}")
    # Retry the request

Key Takeaways

The Messages API is stateless—always send the full conversation history with each request
Prefill is powerful but limited—use it for structured outputs, but avoid it on newer models; use structured outputs instead
Vision support is built-in—send images as base64 or URLs for multimodal analysis
Streaming reduces latency—use SSE for real-time applications like chat interfaces
Always handle stop reasons—especially tool_use if you're building agents, and max_tokens for long responses