BeClaude
GuideBeginnerAPI2026-05-22

Mastering the Claude Messages API: A Practical Guide to Conversations, Prefill, and Vision

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide covers how to send basic requests, build multi-turn conversations, prefill Claude's responses, and use vision capabilities with the Claude Messages API, including Python and TypeScript code examples.

Messages APIClaude APIConversational AIVisionPrefill

Introduction

The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding how to work with messages is essential. This guide walks you through the core patterns: basic requests, multi-turn conversations, prefill techniques, and vision capabilities.

Basic Request and Response

At its simplest, the Messages API accepts a list of messages and returns Claude's response. Here's a minimal example in Python:

import anthropic

client = anthropic.Anthropic() message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] ) print(message)

The response includes:

  • id: Unique message identifier
  • role: Always "assistant"
  • content: Array of content blocks (usually text)
  • model: The model used
  • stop_reason: Why generation stopped (end_turn, max_tokens, stop_sequence, or tool_use)
  • usage: Token counts for input and output
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {"input_tokens": 12, "output_tokens": 6}
}

Multi-Turn Conversations

The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context.

Building a Conversation

To continue a conversation, append both the assistant's previous response and the new user message:

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)

Synthetic Assistant Messages

You can inject synthetic assistant messages — they don't need to have come from Claude. This is useful for:

  • Few-shot prompting: Show Claude examples of desired behavior
  • Guiding tone: Set the style of responses
  • Context injection: Provide information as if Claude already said it
messages = [
    {"role": "user", "content": "Explain quantum computing"},
    {"role": "assistant", "content": "Quantum computing uses qubits..."},  # synthetic
    {"role": "user", "content": "Give me a simple analogy"}
]

Managing Context Windows

Be mindful of the context window. Each turn adds tokens. For long conversations:

  • Use prompt caching to reduce costs on repeated system messages
  • Implement context compaction to summarize earlier turns
  • Consider sliding window approaches for very long histories

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by providing the beginning of its answer. This is powerful for:

  • Constraining output format (e.g., JSON, multiple choice)
  • Guiding reasoning (e.g., "Let me think step by step")
  • Ensuring specific phrasing

Basic Prefill Example

Here's how to get a single letter answer from a multiple-choice question:

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # "C"

Prefill for Structured Output

You can use prefill to force JSON output:

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=200,
    messages=[
        {
            "role": "user",
            "content": "Extract the name and age from: 'John is 30 years old'"
        },
        {
            "role": "assistant",
            "content": "Here is the JSON: {\"name\": \""
        }
    ]
)

Important Limitations

  • Not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6
  • These models return a 400 error for prefill requests
  • Use structured outputs or system prompt instructions instead
  • See the migration guide for alternatives

Vision Capabilities

The Messages API supports images. You can send images as base64-encoded data or via URL.

Sending an Image

import anthropic
import base64

client = anthropic.Anthropic()

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail" } ] } ] ) print(message.content[0].text)

Supported Image Types

  • JPEG, PNG, GIF, WebP
  • Maximum size: 100 MB (but larger images are resized)
  • Optimal resolution: 1568x1568 pixels or less

Handling Stop Reasons

Understanding why Claude stopped helps you build robust applications:

stop_reasonMeaningAction
end_turnClaude finished naturallyContinue conversation
max_tokensHit token limitIncrease max_tokens or truncate
stop_sequenceCustom stop sequence triggeredHandle as designed
tool_useClaude wants to use a toolExecute tool and return result
Example handling:
if message.stop_reason == "max_tokens":
    print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "tool_use":
    # Execute tool calls
    for block in message.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            # Add result to conversation

Best Practices

1. Manage Token Usage

  • Use max_tokens to control response length
  • Monitor usage.input_tokens and usage.output_tokens for cost tracking
  • Implement prompt caching for repeated system messages

2. Handle Errors Gracefully

  • Rate limits: Implement exponential backoff
  • 400 errors: Check model compatibility (especially with prefill)
  • Timeouts: Set appropriate timeouts for long generations

3. Optimize for Your Use Case

  • Chatbots: Use multi-turn patterns with history management
  • Content generation: Use prefill for consistent formatting
  • Data extraction: Combine prefill with low max_tokens
  • Vision tasks: Resize images to optimal resolution before sending

4. Security Considerations

  • The Messages API is eligible for Zero Data Retention (ZDR)
  • When ZDR is enabled, data is not stored after the API response
  • Never send sensitive information in prompts unless you have appropriate agreements

Conclusion

The Claude Messages API provides a flexible foundation for building AI-powered applications. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create sophisticated interactions that leverage Claude's full potential.

Key Takeaways

  • The Messages API is stateless — always send the full conversation history with each request
  • Prefill gives you control over Claude's response format and content, but check model compatibility
  • Vision capabilities allow you to send images alongside text for multimodal analysis
  • Handle stop reasons appropriately to build robust applications that respond to truncation, tool use, and natural endings
  • Monitor token usage to manage costs and optimize context window utilization