BeClaude
GuideBeginnerAPI2026-05-22

Mastering the Messages API: Build Conversational AI with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities. Practical guide with code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Messages APIClaude APIConversational AIPrefillVision

Mastering the Messages API: Build Conversational AI with Claude

Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generator, or a complex agentic system, understanding how to craft and manage messages is essential. This guide walks you through everything from basic requests to advanced techniques like prefill and vision.

Understanding the Messages API

Anthropic offers two main ways to build with Claude:

  • Messages API: Direct model prompting access, giving you fine-grained control over the conversation flow. Best for custom agent loops and real-time interactions.
  • Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
This guide focuses on the Messages API, which is stateless — meaning you always send the full conversation history with each request.

Making Your First API Call

Let's start with a simple request. The following example sends a single user message and prints Claude's response.

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const message = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 1024, messages: [ { role: 'user', content: 'Hello, Claude' } ] });

console.log(message);

Understanding the Response

The API returns a structured JSON object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields:

  • id: Unique identifier for the message
  • content: Array of content blocks (usually text)
  • stop_reason: Why the model stopped (end_turn, max_tokens, stop_sequence, or tool_use)
  • usage: Token counts for input and output

Building Multi-Turn Conversations

Since the Messages API is stateless, you must send the entire conversation history with each request. This allows you to build up context over multiple turns.

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": "Hello!"}, {"role": "user", "content": "Can you describe LLMs to me?"} ] )

print(message.content[0].text)

Important Notes

  • Synthetic assistant messages: You can include messages that didn't actually come from Claude. This is useful for providing context or simulating previous interactions.
  • Conversation history: Always include the full history to maintain context. The order must be alternating user/assistant messages, starting with user.
  • Token costs: Each request includes the entire history, so longer conversations cost more in input tokens.

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response for it. This is powerful for:

  • Constraining output format: Force Claude to start with a specific structure
  • Multiple choice questions: Get a single letter or number as the answer
  • Guiding tone or style: Start Claude's response with the desired tone

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # Output: "C"

Prefill Limitations

Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.

For models that don't support prefill, consider:

  • Structured outputs: Define a JSON schema for the response
  • System prompt instructions: Tell Claude exactly how to format its response

Vision Capabilities

Claude can process images through the Messages API. This enables use cases like:

  • Image analysis and description
  • Document processing (PDFs, screenshots)
  • Visual question answering

Python Example

import anthropic
import base64

client = anthropic.Anthropic()

Load and encode image

with open("screenshot.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "What's in this image?" } ] } ] )

print(message.content[0].text)

Supported Image Formats

  • JPEG
  • PNG
  • GIF
  • WebP
Images can be provided as base64-encoded data or via URL (if using the API with appropriate permissions).

Handling Stop Reasons

Understanding why Claude stopped generating is crucial for building robust applications:

Stop ReasonMeaningAction
end_turnClaude finished naturallyContinue conversation or end
max_tokensOutput hit the token limitIncrease max_tokens or truncate
stop_sequenceClaude encountered a stop sequenceHandle based on your logic
tool_useClaude wants to use a toolExecute the tool and return results

Python Example: Handling Stop Reasons

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[...]
)

if message.stop_reason == "max_tokens": print("Response was truncated. Consider increasing max_tokens.") elif message.stop_reason == "end_turn": print("Claude finished naturally.") elif message.stop_reason == "tool_use": print("Claude requested a tool call.")

Best Practices

1. Manage Token Usage

  • Prompt caching: For repeated system prompts or large context, use prompt caching to reduce costs and latency.
  • Token counting: Use the token counting endpoint to estimate costs before sending requests.
  • Compaction: For very long conversations, consider summarizing earlier turns to save tokens.

2. Handle Errors Gracefully

  • Implement retry logic with exponential backoff for rate limits.
  • Validate inputs before sending to avoid 400 errors.
  • Monitor for stop_reason to detect truncation or tool requests.

3. Use Streaming for Real-Time Applications

For chat interfaces, use streaming to show Claude's response as it's generated:

with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Conclusion

The Messages API is your gateway to building powerful conversational AI applications with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision capabilities, you can create everything from simple chatbots to complex agentic systems.

Remember these key points:

  • The API is stateless — always send the full conversation history
  • Use prefill carefully and check model compatibility
  • Handle stop reasons to build robust applications
  • Leverage streaming for better user experiences

Key Takeaways

  • Messages API is stateless: You must send the full conversation history with each request to maintain context.
  • Prefill is powerful but limited: It works on most models but not on Claude Opus 4.7, Opus 4.6, Sonnet 4.6, or Mythos Preview. Use structured outputs as an alternative.
  • Handle stop reasons: Always check stop_reason to detect truncation (max_tokens) or tool requests (tool_use).
  • Vision is built-in: You can send images as base64 or URL for analysis, enabling document processing and visual QA.
  • Stream for real-time apps: Use the streaming API for chat interfaces to show responses as they're generated.