BeClaude
GuideBeginnerAPI2026-05-22

Mastering the Messages API: Building Conversational AI with Claude

Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, response prefilling, and vision tasks. Practical guide with code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to build conversational AI applications, covering basic requests, multi-turn conversations, response prefilling, and vision capabilities with practical code examples.

Messages APIClaude APIconversational AIprefillvision

Mastering the Messages API: Building Conversational AI with Claude

Claude's Messages API is the core interface for integrating Claude into your applications. Whether you're building a chatbot, a content generator, or a vision-powered assistant, understanding how to work with messages is essential. This guide walks you through everything from basic requests to advanced techniques like prefilling and vision.

Understanding the Messages API

The Messages API is a stateless, RESTful API that lets you send a sequence of messages to Claude and receive a generated response. Unlike some other AI APIs, you always send the full conversation history with each request—Claude doesn't remember previous interactions unless you provide them.

Anthropic offers two primary ways to build with Claude:

  • Messages API: Direct model access for custom agent loops and fine-grained control
  • Claude Managed Agents: Pre-built, configurable agent harness for long-running tasks
This guide focuses on the Messages API, which gives you maximum flexibility.

Making Your First API Call

Let's start with a simple request. You'll need an Anthropic API key and the SDK for your preferred language.

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message.content[0].text)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

async function main() { const message = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 1024, messages: [ { role: 'user', content: 'Hello, Claude' } ] }); console.log(message.content[0].text); }

main();

Understanding the Response

The API returns a structured response object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    { "type": "text", "text": "Hello!" }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields:

  • id: Unique identifier for the message
  • content: Array of content blocks (text, images, tool use, etc.)
  • stop_reason: Why Claude stopped generating (end_turn, max_tokens, stop_sequence, or tool_use)
  • usage: Token counts for billing and monitoring

Building Multi-Turn Conversations

Because the Messages API is stateless, you must maintain conversation history yourself. Each request includes the full history of messages.

Python Example

import anthropic

client = anthropic.Anthropic()

First turn

message1 = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

Second turn - include history

message2 = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": message1.content[0].text}, {"role": "user", "content": "Can you describe LLMs to me?"} ] )

print(message2.content[0].text)

Important Notes

  • Always send the complete history: Claude has no memory between requests
  • Synthetic assistant messages: You can insert pre-written assistant responses for context (e.g., for few-shot examples)
  • Token limits: Longer histories consume more input tokens, so be mindful of context windows

Prefilling Claude's Response

Prefilling lets you "put words in Claude's mouth" by providing the beginning of the assistant's response. This is powerful for:

  • Enforcing response formats
  • Guiding Claude's reasoning
  • Creating structured outputs (though structured outputs are preferred for newer models)

Example: Multiple Choice Answer

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # Outputs: "C"

Prefill Limitations

Prefilling is not supported on:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
For these models, use structured outputs or system prompt instructions instead.

Working with Vision

The Messages API supports image inputs, enabling Claude to analyze and describe visual content.

Python Example

import anthropic
import base64

client = anthropic.Anthropic()

Load and encode image

with open("diagram.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this diagram in detail." } ] } ] )

print(message.content[0].text)

Supported Image Formats

  • JPEG, PNG, GIF, WebP
  • Maximum size: 100MB per image
  • Claude processes images at varying resolutions; larger images use more tokens

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications:

Stop ReasonMeaningAction
end_turnClaude finished naturallyContinue or end conversation
max_tokensHit token limitIncrease max_tokens or truncate
stop_sequenceFound a stop sequenceContinue or process result
tool_useClaude wants to use a toolExecute tool and return result

Example: Handling Tool Use

if message.stop_reason == "tool_use":
    for block in message.content:
        if block.type == "tool_use":
            # Execute the tool and send result back
            tool_result = execute_tool(block.name, block.input)
            # Add to conversation and continue

Best Practices

  • Manage context windows: Keep conversation histories within Claude's context limit. Use techniques like summarization or sliding windows for long conversations.
  • Use system prompts: For consistent behavior, define Claude's persona and constraints in the system parameter (available in newer models).
  • Monitor token usage: Track usage.input_tokens and usage.output_tokens to control costs and stay within limits.
  • Handle errors gracefully: Implement retry logic for rate limits and network issues. Check for 400 errors on invalid requests.
  • Stream responses: For better user experience, use streaming to show Claude's response as it's generated.

Next Steps

Now that you understand the Messages API basics, explore:

Key Takeaways

  • The Messages API is stateless—always send the full conversation history with each request
  • Prefilling lets you guide Claude's responses but is not supported on all models (use structured outputs for newer models)
  • Vision support enables image analysis by sending base64-encoded images in the content array
  • Handle stop reasons (end_turn, max_tokens, tool_use) to build robust conversational flows
  • Monitor token usage and manage context windows to control costs and maintain quality