BeClaude
GuideBeginnerAPI2026-05-22

Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision

Learn how to use the Claude Messages API for single and multi-turn conversations, prefill techniques, vision capabilities, and streaming. Includes Python and TypeScript code examples.

Quick Answer

This guide teaches you how to send requests, manage multi-turn conversations, prefill Claude's responses, use vision with images, and stream outputs using the Messages API.

Messages APIClaudeConversational AIVisionStreaming

Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision

The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, an agent, or a content generation tool, understanding how to structure requests and handle responses is essential.

This guide walks you through the most common patterns: basic requests, multi-turn conversations, prefill techniques, vision capabilities, and streaming. By the end, you'll be able to build robust, production-ready applications with Claude.

Understanding the Messages API vs. Managed Agents

Anthropic offers two paths for building with Claude:

  • Messages API: Direct access to the model. You control the conversation loop, manage state, and handle tool calls. Best for custom agent loops and fine-grained control.
  • Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
This guide focuses on the Messages API, which gives you maximum flexibility.

Making Your First API Request

A basic request to the Messages API requires three things:

  • model: The Claude model you want to use (e.g., claude-opus-4-7)
  • max_tokens: The maximum number of tokens in the response
  • messages: An array of message objects, each with a role and content

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const message = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 1024, messages: [ { role: 'user', content: 'Hello, Claude' } ] });

console.log(message);

Understanding the Response

The API returns a structured JSON object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    { "type": "text", "text": "Hello!" }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields:

  • content: An array of content blocks (text, tool_use, etc.)
  • stop_reason: Why the model stopped (end_turn, max_tokens, stop_sequence, tool_use)
  • usage: Token counts for billing and monitoring

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context.

Example: Two-Turn Conversation

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
Important: The assistant messages in the history don't have to come from Claude. You can inject synthetic assistant messages to guide the conversation or provide context.

Best Practices for Conversation History

  • Keep the full history for coherent multi-turn interactions
  • Truncate or summarize older turns to stay within context limits
  • Use system prompts for persistent instructions
  • Consider prompt caching for long conversations

Prefilling Claude's Response

Prefilling lets you start Claude's response for it. This is useful for:

  • Forcing structured output formats
  • Guiding the model toward a specific answer
  • Reducing latency by constraining the first tokens

Example: Multiple Choice Answer

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is Latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

print(message.content[0].text) # Outputs: "C"

Prefill Limitations

  • Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6
  • Using prefill with these models returns a 400 error
  • Alternative: Use structured outputs or system prompt instructions

Vision: Working with Images

Claude can process images sent via the Messages API. Images can be provided as base64-encoded data or as URLs.

Example: Image Analysis

import anthropic
import base64

client = anthropic.Anthropic()

Read and encode image

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] )

print(message.content[0].text)

Supported Image Formats

  • JPEG, PNG, GIF, WebP
  • Maximum size: 100 MB per image
  • Claude automatically resizes large images

Streaming Responses

For real-time applications, streaming reduces perceived latency. The API supports streaming via Server-Sent Events (SSE).

Python Streaming Example

stream = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a short poem about AI."}
    ],
    stream=True
)

for event in stream: if event.type == "content_block_delta": print(event.delta.text, end="", flush=True)

TypeScript Streaming Example

const stream = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Write a short poem about AI.' }
  ],
  stream: true
});

for await (const event of stream) { if (event.type === 'content_block_delta') { process.stdout.write(event.delta.text); } }

Handling Stop Reasons

Claude can stop generating for several reasons. Your code should handle each case:

stop_reasonMeaningAction
end_turnClaude finished naturallyReturn response
max_tokensResponse was cut offContinue with more tokens or truncate
stop_sequenceA custom stop sequence was hitHandle as needed
tool_useClaude wants to call a toolExecute tool and continue

Error Handling Best Practices

Always wrap API calls in try-except blocks:

try:
    message = client.messages.create(...)
except anthropic.APIError as e:
    print(f"API error: {e}")
except anthropic.RateLimitError as e:
    print(f"Rate limited: {e}")
    # Implement exponential backoff
except anthropic.APIConnectionError as e:
    print(f"Connection error: {e}")
    # Retry the request

Key Takeaways

  • The Messages API is stateless—always send the full conversation history with each request
  • Prefill is powerful but limited—use it for structured outputs, but avoid it on newer models; use structured outputs instead
  • Vision support is built-in—send images as base64 or URLs for multimodal analysis
  • Streaming reduces latency—use SSE for real-time applications like chat interfaces
  • Always handle stop reasons—especially tool_use if you're building agents, and max_tokens for long responses