GuideBeginnerAPI2026-05-22

Mastering the Messages API: A Practical Guide to Building with Claude

Learn how to use Claude's Messages API for single-turn queries, multi-turn conversations, prefill techniques, and vision tasks with practical code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to send requests, manage multi-turn conversations, prefill responses, and handle images, with code examples in Python and TypeScript.

Messages APIClaude APIConversational AIPrefillVision

Introduction

Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generator, or a tool-using agent, understanding the Messages API is essential. This guide covers the core patterns you'll use every day: basic requests, multi-turn conversations, prefill techniques, and vision capabilities.

Note: Anthropic offers two ways to build with Claude: the Messages API (direct model access, best for custom agent loops) and Claude Managed Agents (pre-built harness for long-running tasks). This guide focuses on the Messages API.

Basic Request and Response

At its simplest, a Messages API call sends a list of messages and returns Claude's response. Here's a minimal example in Python:

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message)

Understanding the Response

The response is a JSON object with several key fields:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

content: An array of content blocks (usually text, but can include tool use calls).
stop_reason: Why Claude stopped generating. Common values: "end_turn" (normal completion), "max_tokens" (hit token limit), "tool_use" (Claude wants to call a tool).
usage: Token counts for billing and context management.

Multi-Turn Conversations

The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context but means you need to manage state on your end.

Building a Conversation

To continue a conversation, append new messages to the messages array:

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)

Synthetic Assistant Messages

You can also inject synthetic assistant messages — responses that didn't actually come from Claude. This is useful for:

Providing examples of the response format you want
Correcting context (e.g., "Actually, I already know X")
Simulating multi-step workflows

messages = [
    {"role": "user", "content": "Summarize this article: ..."},
    # Pre-seed a good summary format
    {"role": "assistant", "content": "Here is a summary in bullet points:\n- "}
]

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by providing the beginning of its output. This is a powerful technique for:

Constraining output format (e.g., JSON, multiple choice)
Guiding tone or style
Reducing token usage by steering the response early

Example: Multiple Choice Answer

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)
print(message.content[0].text)  # Output: "C"

By setting max_tokens=1 and prefilling with "The answer is (", Claude only needs to output the letter. This is efficient and predictable.

Important Limitations

Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests with these models return a 400 error.
For those models, use structured outputs or system prompt instructions instead.

Vision: Working with Images

Claude can analyze images sent through the Messages API. Images are sent as base64-encoded data or via URL.

Sending an Image

import anthropic
import base64
client = anthropic.Anthropic()
Read and encode image
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this chart in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Supported Image Types

image/jpeg
image/png
image/gif (first frame only)
image/webp

Tips for Vision

Keep images under 20MB for optimal performance.
Combine with text prompts for best results (e.g., "What's the trend in this graph?")
Use high-resolution images when fine details matter (Claude supports up to 8K resolution).

Handling Stop Reasons

Understanding stop_reason helps you build robust applications:

Stop Reason	Meaning	Action
`end_turn`	Claude finished naturally	Continue or end conversation
`max_tokens`	Hit token limit	Increase `max_tokens` or continue
`tool_use`	Claude wants to call a tool	Execute the tool and return result
`stop_sequence`	Found a custom stop sequence	Handle as needed

Example: Handling max_tokens

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=100,
    messages=[{"role": "user", "content": "Write a long story"}]
)
if response.stop_reason == "max_tokens":
    # Continue the conversation
    messages.append({"role": "assistant", "content": response.content[0].text})
    messages.append({"role": "user", "content": "Continue"})
    # Send again...

Best Practices

Manage context windows: Keep conversation history within Claude's context limit (varies by model). Use prompt caching for long histories.
Use system prompts: For persistent instructions, use the system parameter instead of repeating in every user message.
Handle errors gracefully: The API may return errors for invalid requests, rate limits, or server issues. Implement retry logic with exponential backoff.
Monitor token usage: Track usage.input_tokens and usage.output_tokens to optimize costs and avoid surprises.
Stream for responsiveness: For long responses, use streaming to show output incrementally.

Conclusion

The Messages API is the foundation for all Claude integrations. By mastering basic requests, multi-turn conversations, prefill, and vision, you can build sophisticated applications that leverage Claude's full capabilities. Start with simple patterns and gradually add complexity as your use case demands.

Key Takeaways

The Messages API is stateless — always send the full conversation history with each request.
Prefill lets you guide Claude's response by providing the beginning of its output, but is not supported on all models.
Vision capabilities allow Claude to analyze images sent as base64 or URLs.
Always check stop_reason to determine the next action in your application logic.
Use synthetic assistant messages to provide examples or correct context without real Claude responses.