GuideBeginnerAPI2026-05-22

Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide covers how to send basic requests, build multi-turn conversations, prefill Claude's responses, and use vision capabilities with the Claude Messages API, including Python and TypeScript code examples.

Messages APIClaude APIConversational AIVisionPrefill

Introduction

The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding how to work with messages is essential. This guide walks you through the core patterns: basic requests, multi-turn conversations, prefill techniques, and vision capabilities.

Basic Request and Response

At its simplest, the Messages API accepts a list of messages and returns Claude's response. Here's a minimal example in Python:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

The response includes:

id: Unique message identifier
role: Always "assistant"
content: Array of content blocks (usually text)
model: The model used
stop_reason: Why generation stopped (end_turn, max_tokens, stop_sequence, or tool_use)
usage: Token counts for input and output

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {"input_tokens": 12, "output_tokens": 6}
}

Multi-Turn Conversations

The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context.

Building a Conversation

To continue a conversation, append both the assistant's previous response and the new user message:

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)

Synthetic Assistant Messages

You can inject synthetic assistant messages — they don't need to have come from Claude. This is useful for:

Few-shot prompting: Show Claude examples of desired behavior
Guiding tone: Set the style of responses
Context injection: Provide information as if Claude already said it

messages = [
    {"role": "user", "content": "Explain quantum computing"},
    {"role": "assistant", "content": "Quantum computing uses qubits..."},  # synthetic
    {"role": "user", "content": "Give me a simple analogy"}
]

Managing Context Windows

Be mindful of the context window. Each turn adds tokens. For long conversations:

Use prompt caching to reduce costs on repeated system messages
Implement context compaction to summarize earlier turns
Consider sliding window approaches for very long histories

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by providing the beginning of its answer. This is powerful for:

Constraining output format (e.g., JSON, multiple choice)
Guiding reasoning (e.g., "Let me think step by step")
Ensuring specific phrasing

Basic Prefill Example

Here's how to get a single letter answer from a multiple-choice question:

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # "C"

Prefill for Structured Output

You can use prefill to force JSON output:

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=200,
    messages=[
        {
            "role": "user",
            "content": "Extract the name and age from: 'John is 30 years old'"
        },
        {
            "role": "assistant",
            "content": "Here is the JSON: {\"name\": \""
        }
    ]
)

Important Limitations

Not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6
These models return a 400 error for prefill requests
Use structured outputs or system prompt instructions instead
See the migration guide for alternatives

Vision Capabilities

The Messages API supports images. You can send images as base64-encoded data or via URL.

Sending an Image

import anthropic
import base64
client = anthropic.Anthropic()
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail"
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Types

JPEG, PNG, GIF, WebP
Maximum size: 100 MB (but larger images are resized)
Optimal resolution: 1568x1568 pixels or less

Handling Stop Reasons

Understanding why Claude stopped helps you build robust applications:

stop_reason	Meaning	Action
`end_turn`	Claude finished naturally	Continue conversation
`max_tokens`	Hit token limit	Increase `max_tokens` or truncate
`stop_sequence`	Custom stop sequence triggered	Handle as designed
`tool_use`	Claude wants to use a tool	Execute tool and return result

Example handling:

if message.stop_reason == "max_tokens":
    print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "tool_use":
    # Execute tool calls
    for block in message.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            # Add result to conversation

Best Practices

1. Manage Token Usage

Use max_tokens to control response length
Monitor usage.input_tokens and usage.output_tokens for cost tracking
Implement prompt caching for repeated system messages

2. Handle Errors Gracefully

Rate limits: Implement exponential backoff
400 errors: Check model compatibility (especially with prefill)
Timeouts: Set appropriate timeouts for long generations

3. Optimize for Your Use Case

Chatbots: Use multi-turn patterns with history management
Content generation: Use prefill for consistent formatting
Data extraction: Combine prefill with low max_tokens
Vision tasks: Resize images to optimal resolution before sending

4. Security Considerations

The Messages API is eligible for Zero Data Retention (ZDR)
When ZDR is enabled, data is not stored after the API response
Never send sensitive information in prompts unless you have appropriate agreements

Conclusion

The Claude Messages API provides a flexible foundation for building AI-powered applications. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create sophisticated interactions that leverage Claude's full potential.

Key Takeaways

The Messages API is stateless — always send the full conversation history with each request
Prefill gives you control over Claude's response format and content, but check model compatibility
Vision capabilities allow you to send images alongside text for multimodal analysis
Handle stop reasons appropriately to build robust applications that respond to truncation, tool use, and natural endings
Monitor token usage to manage costs and optimize context window utilization