Guide2026-05-01

Mastering the Messages API: Building Conversational Apps with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide covers the Messages API patterns: basic requests, multi-turn conversations by sending full history, prefilling Claude's responses to shape output, and using vision. You'll get Python/TypeScript examples for each pattern.

Messages APIClaude APIConversational AIPrompt EngineeringMulti-turn Chat

Mastering the Messages API: Building Conversational Apps with Claude

Claude’s Messages API is the primary way to interact with the model programmatically. Whether you’re building a simple chatbot, a multi-turn assistant, or a vision-powered app, understanding how to structure your API calls is essential. This guide walks you through the core patterns—basic requests, multi-turn conversations, prefill techniques, and vision capabilities—with practical code examples in Python and TypeScript.

Understanding the Messages API vs. Managed Agents

Anthropic offers two paths for building with Claude:

Messages API: Direct model access. You control every aspect of the conversation loop. Best for custom agents, fine-grained control, and real-time interactions.
Claude Managed Agents: A pre-built, configurable agent harness that runs on managed infrastructure. Best for long-running tasks and asynchronous work.

This guide focuses on the Messages API, which gives you maximum flexibility.

Basic Request and Response

At its simplest, you send a list of messages and receive a response. Each message has a role (user or assistant) and content.

Python Example

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Hello, Claude' }
  ]
});
console.log(message);

Response Structure

The API returns a structured response:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    { "type": "text", "text": "Hello!" }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields:

content: An array of content blocks (text, tool_use, etc.).
stop_reason: Why the model stopped (end_turn, max_tokens, stop_sequence, or tool_use).
usage: Token counts for billing and monitoring.

Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage the history yourself.

Building a Conversation Over Time

import anthropic
client = anthropic.Anthropic()
First turn
response1 = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
Second turn: include previous exchange
response2 = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": response1.content[0].text},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(response2.content[0].text)

Synthetic Assistant Messages

You can inject synthetic assistant messages—they don’t have to come from Claude. This is useful for:

Providing example responses (few-shot prompting)
Guiding conversation flow
Simulating a persona

messages = [
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What about Italy?"}
]

Putting Words in Claude’s Mouth (Prefill)

Prefilling lets you start Claude’s response by including an assistant message with partial content at the end of your messages array. This is powerful for:

Constraining output format (e.g., JSON, multiple choice)
Guiding tone or style
Reducing token usage on predictable responses

Example: Multiple Choice Answer

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Outputs: C

By setting max_tokens=1 and prefilling "The answer is (", Claude only needs to output a single character. This is efficient and deterministic.

Prefill Best Practices

Match the tone: If you prefill with formal language, Claude continues formally.
Don’t contradict: Prefilling something the model wouldn’t naturally say can cause confusion.
Use for structure: Prefill JSON keys or XML tags to enforce output format.

Vision Capabilities

The Messages API supports image inputs. You can send images as base64-encoded data or via URLs (if hosted publicly).

Sending an Image (Python)

import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Media Types

image/jpeg
image/png
image/gif
image/webp

Vision Tips

Image size matters: Larger images consume more tokens. Resize to 1024x1024 or less if possible.
Combine with text: Always include a text prompt to guide Claude’s analysis.
Multiple images: You can include multiple images in a single message.

Handling Stop Reasons

Understanding stop_reason helps you build robust applications:

stop_reason	Meaning	Action
`end_turn`	Claude finished naturally	Continue or end conversation
`max_tokens`	Output was truncated	Increase `max_tokens` or split response
`stop_sequence`	A custom stop sequence was hit	Handle as needed
`tool_use`	Claude wants to call a tool	Execute tool and return result

Example: Handling max_tokens

if message.stop_reason == "max_tokens":
    # Continue the conversation to get more output
    messages.append({"role": "assistant", "content": message.content[0].text})
    messages.append({"role": "user", "content": "Please continue."})
    # Call API again

Streaming for Real-Time Responses

For a better user experience, use streaming. The API sends chunks as they’re generated.

Python Streaming

import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Tell me a story"}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Key Takeaways

The Messages API is stateless—you must send the full conversation history with every request. Manage context on your side.
Prefilling lets you shape Claude’s responses by providing a partial assistant message. Use it for format control and efficiency.
Vision is supported via base64-encoded images or URLs. Combine images with text prompts for best results.
Streaming improves user experience by delivering tokens in real time. Use it for chat applications.
Handle stop reasons appropriately—especially max_tokens and tool_use—to build robust, production-ready applications.