GuideBeginnerAPI2026-05-22

Mastering the Messages API: Building Conversational AI with Claude

Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, response prefilling, and vision tasks. Practical guide with code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, response prefilling, and vision capabilities with practical code examples.

Messages APIClaude APIconversational AIprefillvision

Mastering the Messages API: Building Conversational AI with Claude

Claude's Messages API is the core interface for integrating Claude into your applications. Whether you're building a chatbot, a content generator, or a vision-powered assistant, understanding how to work with messages is essential. This guide walks you through everything from basic requests to advanced techniques like prefilling and vision.

Understanding the Messages API

The Messages API is a stateless, RESTful API that lets you send a sequence of messages to Claude and receive a generated response. Unlike some other AI APIs, you always send the full conversation history with each request—Claude doesn't remember previous interactions unless you provide them.

Anthropic offers two primary ways to build with Claude:

Messages API: Direct model access for custom agent loops and fine-grained control
Claude Managed Agents: Pre-built, configurable agent harness for long-running tasks

This guide focuses on the Messages API, which gives you maximum flexibility.

Making Your First API Call

Let's start with a simple request. You'll need an Anthropic API key and the SDK for your preferred language.

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
async function main() {
  const message = await client.messages.create({
    model: 'claude-opus-4-7',
    max_tokens: 1024,
    messages: [
      { role: 'user', content: 'Hello, Claude' }
    ]
  });
  
  console.log(message.content[0].text);
}
main();

Understanding the Response

The API returns a structured response object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    { "type": "text", "text": "Hello!" }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields:

id: Unique identifier for the message
content: Array of content blocks (text, images, tool use, etc.)
stop_reason: Why Claude stopped generating (end_turn, max_tokens, stop_sequence, or tool_use)
usage: Token counts for billing and monitoring

Building Multi-Turn Conversations

Because the Messages API is stateless, you must maintain conversation history yourself. Each request includes the full history of messages.

Python Example

import anthropic
client = anthropic.Anthropic()
First turn
message1 = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
Second turn - include history
message2 = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": message1.content[0].text},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(message2.content[0].text)

Important Notes

Always send the complete history: Claude has no memory between requests
Synthetic assistant messages: You can insert pre-written assistant responses for context (e.g., for few-shot examples)
Token limits: Longer histories consume more input tokens, so be mindful of context windows

Prefilling Claude's Response

Prefilling lets you "put words in Claude's mouth" by providing the beginning of the assistant's response. This is powerful for:

Enforcing response formats
Guiding Claude's reasoning
Creating structured outputs (though structured outputs are preferred for newer models)

Example: Multiple Choice Answer

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Outputs: "C"

Prefill Limitations

Prefilling is not supported on:

Claude Mythos Preview
Claude Opus 4.7
Claude Opus 4.6
Claude Sonnet 4.6

For these models, use structured outputs or system prompt instructions instead.

Working with Vision

The Messages API supports image inputs, enabling Claude to analyze and describe visual content.

Python Example

import anthropic
import base64
client = anthropic.Anthropic()
Load and encode image
with open("diagram.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this diagram in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Formats

JPEG, PNG, GIF, WebP
Maximum size: 100MB per image
Claude processes images at varying resolutions; larger images use more tokens

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications:

Stop Reason	Meaning	Action
`end_turn`	Claude finished naturally	Continue or end conversation
`max_tokens`	Hit token limit	Increase `max_tokens` or truncate
`stop_sequence`	Found a stop sequence	Continue or process result
`tool_use`	Claude wants to use a tool	Execute tool and return result

Example: Handling Tool Use

if message.stop_reason == "tool_use":
    for block in message.content:
        if block.type == "tool_use":
            # Execute the tool and send result back
            tool_result = execute_tool(block.name, block.input)
            # Add to conversation and continue

Best Practices

Manage context windows: Keep conversation histories within Claude's context limit. Use techniques like summarization or sliding windows for long conversations.

Use system prompts: For consistent behavior, define Claude's persona and constraints in the system parameter (available in newer models).

Monitor token usage: Track usage.input_tokens and usage.output_tokens to control costs and stay within limits.

Handle errors gracefully: Implement retry logic for rate limits and network issues. Check for 400 errors on invalid requests.

Stream responses: For better user experience, use streaming to show Claude's response as it's generated.

Next Steps

Now that you understand the Messages API basics, explore:

Streaming Messages for real-time responses
Tool Use to give Claude abilities
Prompt Caching to reduce costs

Key Takeaways

The Messages API is stateless—always send the full conversation history with each request
Prefilling lets you guide Claude's responses but is not supported on all models (use structured outputs for newer models)
Vision support enables image analysis by sending base64-encoded images in the content array
Handle stop reasons (end_turn, max_tokens, tool_use) to build robust conversational flows
Monitor token usage and manage context windows to control costs and maintain quality