GuideBeginnerAPI2026-05-15

Mastering the Claude Messages API: From Basic Requests to Advanced Patterns

Learn how to use the Claude Messages API effectively with practical examples covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities.

Quick Answer

This guide teaches you how to work with the Claude Messages API, including making basic requests, building multi-turn conversations, using prefill to shape responses, and sending images for vision tasks.

Messages APIClaude APImultiturn conversationsprefillvision

Introduction

The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding how to structure your API calls is essential. This guide walks you through the core patterns for working with the Messages API, from simple requests to advanced techniques like prefill and vision.

Basic Request and Response

At its simplest, the Messages API accepts a list of messages and returns a response. Here's a minimal example using Python:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

The response includes:

id: A unique identifier for the message
role: Always "assistant" for responses
content: An array of content blocks (usually text)
model: The model used
stop_reason: Why the generation stopped (e.g., "end_turn", "max_tokens")
usage: Token counts for input and output

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {"input_tokens": 12, "output_tokens": 6}
}

Multi-Turn Conversations

The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context and allows you to build dynamic conversations over time.

Building a Conversation

To continue a conversation, simply append new messages to the history:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
print(message.content[0].text)

Synthetic Assistant Messages

You don't have to use only real Claude responses. You can inject synthetic assistant messages to guide the conversation or simulate context. For example, you might pre-populate a conversation with a system-like assistant response:

messages = [
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What is its population?"}
]

This is useful for:

Providing context from previous sessions
Simulating a specific assistant persona
Building few-shot examples into the conversation

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by providing the beginning of its answer. This is powerful for:

Constraining responses to specific formats
Guiding the model toward a particular structure
Getting single-word or single-token answers

Basic Prefill Example

Here's how to use prefill to get a multiple-choice answer:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Output: "C"

Important Prefill Limitations

Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6
These models return a 400 error if you attempt prefill
Alternative: Use structured outputs or system prompt instructions instead

When to Use Prefill

Classification tasks: Force Claude to output a specific label
JSON extraction: Start with {" to ensure valid JSON output
Format control: Begin a list or table structure
Single-token answers: Combine with max_tokens=1 for constrained responses

Vision: Sending Images to Claude

Claude can analyze images sent via the Messages API. This is useful for:

Document analysis
Image description
Visual question answering

Image Request Format

Images are sent as content blocks with a source object:

import anthropic
import base64
client = anthropic.Anthropic()
Read and encode the image
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Media Types

image/jpeg
image/png
image/gif (first frame only)
image/webp

Tips for Vision Requests

Combine with text: Always include a text prompt alongside the image for best results
Image size: Larger images consume more tokens; resize if needed
Multiple images: You can send multiple images in a single message

Handling Stop Reasons

Every response includes a stop_reason field that tells you why generation stopped:

Stop Reason	Meaning
`end_turn`	Claude finished naturally
`max_tokens`	Hit the token limit; response may be truncated
`stop_sequence`	A custom stop sequence was encountered
`tool_use`	Claude wants to use a tool (for agent workflows)

For max_tokens, you should continue the conversation by sending the partial response back and asking Claude to continue.

Best Practices

1. Manage Token Usage

Monitor usage.input_tokens and usage.output_tokens to control costs
Use max_tokens to limit response length
Consider prompt caching for repeated system prompts

2. Handle Errors Gracefully

Implement retry logic with exponential backoff
Check for 400 errors (invalid requests) and 429 errors (rate limits)
Validate your message structure before sending

3. Optimize for Your Use Case

Chatbots: Use multi-turn patterns with full history
Classification: Use prefill with max_tokens=1
Content generation: Use system prompts and longer max_tokens
Vision tasks: Combine images with clear text instructions

4. Security Considerations

Never expose API keys in client-side code
Validate and sanitize user input before sending to the API
Be aware of data retention policies (ZDR available for eligible organizations)

Conclusion

The Claude Messages API is flexible and powerful. By mastering basic requests, multi-turn conversations, prefill, and vision, you can build sophisticated applications that leverage Claude's capabilities. Remember that the API is stateless — you control the context by managing the conversation history yourself.

Key Takeaways

The Messages API is stateless — always send the full conversation history with each request
Use prefill to guide Claude's responses, but check model compatibility (not supported on Opus 4.7 and later)
Vision capabilities allow you to send images alongside text prompts for analysis
Monitor stop_reason to handle truncated responses and tool use scenarios
Synthetic assistant messages give you full control over conversation context and few-shot examples