GuideBeginnerAPI2026-05-22

Mastering the Messages API: Build Conversational AI with Claude

Learn how to use the Claude Messages API for single-turn queries, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques to shape responses, and vision capabilities, with code examples in Python and TypeScript.

Messages APIClaude APIconversational AIprefillvision

Introduction

The Claude Messages API is the primary interface for building conversational AI applications with Anthropic's Claude models. Whether you're creating a chatbot, a content generation tool, or an intelligent assistant, understanding how to work with messages is essential.

This guide covers the core patterns for using the Messages API effectively: from simple single-turn requests to complex multi-turn conversations, prefill techniques for controlling responses, and leveraging vision capabilities. By the end, you'll have a solid foundation for building production-ready applications with Claude.

Basic Request and Response

At its simplest, the Messages API accepts a list of messages and returns a response. Here's a minimal example in Python:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message.content[0].text)

And the equivalent in TypeScript:

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Hello, Claude' }
  ]
});
console.log(message.content[0].text);

The API response includes:

id: Unique identifier for the message
role: Always "assistant" for responses
content: Array of content blocks (typically text)
model: The model used
stop_reason: Why generation stopped (e.g., "end_turn", "max_tokens")
usage: Token counts for input and output

Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over context but requires you to manage state on your end.

Building a Conversation

To continue a conversation, simply append new messages to the history:

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello! How can I help you today?"},
        {"role": "user", "content": "Can you explain how transformers work?"}
    ]
)

Synthetic Assistant Messages

You can insert synthetic assistant messages—responses that didn't actually come from Claude. This is useful for:

Providing examples: Show Claude the format you want
Correcting behavior: Guide Claude toward a specific response style
Simulating scenarios: Test how Claude handles different situations

messages = [
    {"role": "user", "content": "Summarize this article"},
    {"role": "assistant", "content": "I'll provide a concise summary with bullet points."},
    {"role": "user", "content": "Here's the article: ..."}
]

Managing Conversation History

For long conversations, be mindful of token limits. Strategies include:

Summarization: Periodically summarize older turns
Sliding window: Keep only the most recent N messages
Selective retention: Keep system messages and recent exchanges, drop older ones

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by providing the beginning of its answer. This powerful technique lets you shape the response format, enforce structure, or guide Claude toward specific outputs.

Basic Prefill Example

Here's how to use prefill to get a single-letter answer from a multiple-choice question:

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is the Latin name for ant? (A) Apoidea (B) Rhopalocera (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Output: "C"

Practical Applications

JSON extraction: Prefill with {"result": to get structured JSON output
Format enforcement: Start with Here's your summary: to ensure a summary format
Chain-of-thought: Prefill with Let me think step by step: to encourage reasoning

Important Notes

Prefill is not supported on Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, or Claude Mythos Preview
For these models, use structured outputs or system prompt instructions instead
When using prefill, set max_tokens appropriately to leave room for completion

Vision Capabilities

Claude can process images through the Messages API. This enables use cases like image analysis, document processing, and visual question answering.

Sending an Image

import base64
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail"
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Formats

PNG, JPEG, WEBP, GIF (non-animated)
Maximum size: ~100MB (but smaller is better for performance)
Claude can extract text from images, analyze diagrams, and describe visual content

Handling Stop Reasons

The stop_reason field tells you why Claude stopped generating. Understanding these helps you handle different scenarios:

stop_reason	Meaning	Action
`end_turn`	Claude finished naturally	Continue conversation
`max_tokens`	Hit token limit	Increase `max_tokens` or split response
`stop_sequence`	Found a stop sequence	Handle as needed
`tool_use`	Claude wants to use a tool	Execute tool and continue

Example: Handling max_tokens

if message.stop_reason == "max_tokens":
    # Continue the conversation to get more content
    messages.append({"role": "assistant", "content": message.content[0].text})
    messages.append({"role": "user", "content": "Please continue"})
    # Make another API call

Best Practices

1. Manage Token Usage

Monitor usage.input_tokens and usage.output_tokens in responses
Use shorter conversation histories when possible
Consider prompt caching for repeated system prompts

2. Handle Errors Gracefully

try:
    message = client.messages.create(...)
except anthropic.APIError as e:
    print(f"API error: {e}")
    # Implement retry logic or fallback
except anthropic.RateLimitError as e:
    print(f"Rate limited: {e}")
    # Wait and retry
except anthropic.APIConnectionError as e:
    print(f"Connection error: {e}")
    # Retry with backoff

3. Use System Messages for Instructions

For persistent instructions, use the system parameter instead of repeating instructions in every user message:

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant that always responds in French.",
    messages=[
        {"role": "user", "content": "Hello"}
    ]
)

4. Streaming for Better UX

For long responses, use streaming to show output incrementally:

stream = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a long story"}],
    stream=True
)
for chunk in stream:
    if chunk.type == "content_block_delta":
        print(chunk.delta.text, end="", flush=True)

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create sophisticated AI applications that leverage Claude's full potential.

Remember that the API is stateless—you manage conversation history. Use prefill judiciously for response shaping, and always handle stop reasons and errors appropriately. With these patterns, you're ready to build production-ready conversational AI.

Key Takeaways

The Messages API is stateless: You must send the full conversation history with each request, giving you complete control over context.
Prefill shapes responses: Starting Claude's response lets you enforce formats, extract structured data, and guide behavior—but check model compatibility.
Vision capabilities are powerful: Send images as base64-encoded data for analysis, description, and text extraction.
Handle stop reasons: Different stop reasons (end_turn, max_tokens, tool_use) require different handling strategies.
Stream for better UX: Use streaming to show responses incrementally and improve user experience.