BeClaude
GuideBeginnerAPI2026-05-17

Mastering the Messages API: Building Conversational AI with Claude

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide covers how to use the Claude Messages API to build conversational AI applications, including basic requests, multi-turn conversations, prefill techniques, and vision capabilities with Python and TypeScript code examples.

Messages APIClaude APIConversational AIPrefillVision

Introduction

The Claude Messages API is the primary interface for building conversational AI applications with Anthropic's Claude models. Whether you're creating a simple chatbot or a complex multi-turn assistant, understanding how to work with messages effectively is essential.

This guide covers the core patterns you'll use daily: basic requests, managing conversation history, pre-filling responses, and working with images. By the end, you'll have a solid foundation for building production-ready applications with Claude.

Basic Request and Response

At its simplest, the Messages API takes a list of messages and returns Claude's response. Here's the minimal example in Python:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

The response includes several important fields:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

  • content: An array of content blocks (usually text, but can include tool use blocks)
  • stop_reason: Why Claude stopped generating ("end_turn", "max_tokens", "stop_sequence", or "tool_use")
  • usage: Token counts for billing and context window management

Multi-Turn Conversations

The Messages API is stateless — each request must include the full conversation history. This gives you complete control over context but requires you to manage the conversation state on your end.

Here's how to build a multi-turn conversation:

import anthropic

client = anthropic.Anthropic()

First turn

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

Extract Claude's response

assistant_response = message.content[0].text

Second turn: include the full history

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": assistant_response}, {"role": "user", "content": "Can you describe LLMs to me?"} ] )

print(message.content[0].text)

Synthetic Assistant Messages

You can inject synthetic assistant messages into the history. This is useful for:

  • Providing few-shot examples
  • Guiding conversation flow
  • Implementing system-like behavior without the system prompt
messages = [
    {"role": "user", "content": "What's the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What about Italy?"}
]

Prefill: Putting Words in Claude's Mouth

Prefilling allows you to start Claude's response for it. This is powerful for:

  • Enforcing response format (e.g., JSON, multiple choice)
  • Guiding the tone or structure
  • Reducing output tokens for constrained tasks

Basic Prefill Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # "C"

By setting max_tokens=1 and prefilling with "The answer is (", Claude only needs to output the letter. This is perfect for multiple-choice classification tasks.

Prefill Limitations

Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.

For models that don't support prefill, consider:

  • Structured outputs: Define a JSON schema for the response
  • System prompt instructions: Use clear formatting instructions in the system prompt

Working with Images (Vision)

Claude can analyze images sent via the Messages API. This enables use cases like document analysis, screenshot interpretation, and visual question answering.

Sending an Image

import anthropic
import base64

client = anthropic.Anthropic()

Read and encode the image

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] )

print(message.content[0].text)

Supported media types: image/jpeg, image/png, image/gif, image/webp.

Image Size Limits

Claude processes images at different resolutions depending on size:

  • Images under 1,950 pixels on the longest side are processed at original resolution
  • Larger images are scaled down to fit within 1,950 pixels
  • Very large images (over 8,000 pixels) may be rejected
For optimal performance, resize images to around 1,000-2,000 pixels on the longest side before sending.

Handling Stop Reasons

Understanding stop_reason helps you build robust applications:

Stop ReasonMeaningAction
end_turnClaude finished naturallyContinue conversation
max_tokensOutput hit token limitIncrease max_tokens or truncate
stop_sequenceCustom stop sequence triggeredHandle as needed
tool_useClaude wants to call a toolExecute tool and continue
if message.stop_reason == "max_tokens":
    print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "tool_use":
    print("Claude requested a tool call.")
    # Handle tool execution...

Best Practices

1. Manage Token Usage

Always check usage.input_tokens and usage.output_tokens to track costs. For long conversations, consider:

  • Summarizing older messages
  • Using prompt caching for repeated system instructions
  • Trimming history when approaching context limits

2. Handle Errors Gracefully

try:
    message = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
except anthropic.APIError as e:
    print(f"API error: {e}")
    # Implement retry logic or fallback
except anthropic.APIConnectionError as e:
    print(f"Connection error: {e}")
    # Retry after delay

3. Use Streaming for Responsive UIs

For chat applications, use streaming to show tokens as they're generated:

with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision capabilities, you can create sophisticated conversational AI applications. Remember that the API is stateless — you manage the conversation history — and always check stop reasons to handle different scenarios appropriately.

Key Takeaways

  • The Messages API is stateless — always send the full conversation history with each request
  • Prefill allows you to guide Claude's responses by starting its reply, but check model compatibility
  • Vision capabilities let Claude analyze images sent as base64-encoded data
  • Always check stop_reason to understand why Claude stopped generating and handle edge cases
  • Use streaming for real-time user interfaces and track token usage to manage costs