BeClaude
GuideBeginner2026-05-06

Mastering the Messages API: A Practical Guide to Building Conversational AI with Claude

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision. Includes code examples in Python and TypeScript.

Quick Answer

This guide teaches you how to build conversational AI with Claude's Messages API, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical Python and TypeScript code examples.

Messages APIClaude APIConversational AIPrefillVision

Introduction

Claude's Messages API is the primary way to interact with Claude programmatically. Whether you're building a custom chatbot, an AI assistant, or integrating Claude into your application, understanding the Messages API is essential. This guide covers the most common patterns—from simple requests to advanced techniques like prefill and vision—so you can get the most out of Claude.

Basic Request and Response

At its core, the Messages API is straightforward: you send a list of messages and receive a response. Here's a minimal example in Python:

import anthropic

client = anthropic.Anthropic() message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] ) print(message)

The response includes the model's reply, metadata, and token usage:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}
Key fields to note:
  • stop_reason: Indicates why the response ended (end_turn means Claude finished naturally).
  • usage: Tracks input and output tokens for billing and optimization.

Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over the conversation context.

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
Important: The assistant messages don't have to come from Claude—you can inject synthetic assistant responses to guide the conversation. This is useful for:
  • Providing example responses
  • Correcting or redirecting Claude
  • Simulating multi-turn interactions

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by providing the beginning of its answer. This is powerful for controlling output format, enforcing structure, or getting concise answers.

Example: Multiple Choice Answer

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

Response:

{
  "content": [{"type": "text", "text": "C"}],
  "stop_reason": "max_tokens"
}

Prefill Limitations

Prefill is not supported on these models:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.

When to Use Prefill

  • Format control: Force JSON, XML, or specific output structures
  • Constrained generation: Get single-token answers (yes/no, multiple choice)
  • Role-playing: Set the tone or persona from the first word

Vision: Sending Images to Claude

Claude can analyze images sent via the Messages API. This enables use cases like document analysis, image description, and visual Q&A.

Base64 Image Example (Python)

import anthropic
import base64

client = anthropic.Anthropic()

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] ) print(message.content[0].text)

Supported Image Formats

  • JPEG, PNG, GIF, WebP
  • Maximum size: 100MB per image
  • Claude processes images at varying resolutions; larger images use more tokens

Vision Best Practices

  • Combine with text: Always include a text prompt alongside images for best results
  • Use appropriate resolution: High-resolution images provide more detail but cost more tokens
  • One image per message: For complex analysis, send images one at a time

Handling Stop Reasons

The stop_reason field tells you why Claude stopped generating. Common values:

Stop ReasonMeaning
end_turnClaude finished naturally
max_tokensResponse hit the token limit
stop_sequenceClaude encountered a stop sequence
tool_useClaude wants to use a tool
Pro tip: If you see max_tokens, consider increasing max_tokens or breaking your request into smaller chunks.

Streaming Responses

For real-time applications, use streaming to get Claude's response incrementally:

stream = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for event in stream: if event.type == "content_block_delta": print(event.delta.text, end="", flush=True)

Streaming is ideal for:

  • Chat interfaces with real-time display
  • Long responses where users expect immediate feedback
  • Reducing perceived latency

Error Handling

Common API errors and how to handle them:

ErrorCauseSolution
400 Bad RequestInvalid parametersCheck model name, message format
401 UnauthorizedInvalid API keyVerify your API key
429 Rate LimitToo many requestsImplement exponential backoff
529 OverloadedServer overloadRetry with delay
import time
from anthropic import Anthropic, APIError, RateLimitError

client = Anthropic()

for attempt in range(3): try: message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[{"role": "user", "content": "Hello"}] ) break except RateLimitError: time.sleep(2 ** attempt) except APIError as e: print(f"API error: {e}") break

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision, you can create sophisticated AI applications. Remember:

  • Always send the full conversation history (stateless API)
  • Use prefill for output control (but check model compatibility)
  • Stream responses for better user experience
  • Handle errors gracefully with retries

Key Takeaways

  • Stateless design: You must send the full conversation history with each request—this gives you complete control over context.
  • Prefill for precision: Use prefill to control output format and get concise answers, but avoid it on newer models (Opus 4.7, Sonnet 4.6) where it's unsupported.
  • Vision integration: Claude can analyze images via base64 encoding; always pair images with text prompts for best results.
  • Stream for speed: Streaming reduces perceived latency and is essential for real-time chat interfaces.
  • Error handling matters: Implement retry logic with exponential backoff to handle rate limits and server overloads gracefully.