BeClaude
GuideBeginnerAPI2026-05-21

Mastering the Messages API: A Practical Guide to Building with Claude

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use the Claude Messages API to send basic requests, manage multi-turn conversations, prefill Claude's responses, and handle images. You'll get practical code examples in Python and TypeScript.

Messages APIClaude APIMulti-turn conversationsPrefillVision

Mastering the Messages API: A Practical Guide to Building with Claude

If you're building applications with Claude, the Messages API is your primary interface. It's the direct, programmatic way to send prompts and receive responses from Claude's models. This guide walks you through everything you need to know—from your first request to advanced techniques like multi-turn conversations and prefill.

Whether you're building a chatbot, a content generator, or a tool-using agent, understanding the Messages API is essential. Let's dive in.

What is the Messages API?

The Messages API is Anthropic's primary API for direct model access. You send a list of messages (with roles like user and assistant), and Claude returns a response. It's stateless, meaning you always send the full conversation history with each request.

Note: Anthropic also offers Claude Managed Agents, a pre-built agent harness for long-running tasks. The Messages API is for when you need fine-grained control and custom agent loops.

Your First API Request

Let's start with the simplest possible request: sending a single message and getting a response.

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const message = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 1024, messages: [ { role: 'user', content: 'Hello, Claude' } ] });

console.log(message);

Understanding the Response

The API returns a JSON object with key fields:

  • id: Unique identifier for the message
  • role: Always "assistant"
  • content: Array of content blocks (usually text)
  • model: The model used
  • stop_reason: Why the response ended ("end_turn", "max_tokens", "stop_sequence", or "tool_use")
  • usage: Token counts for input and output
Example output:
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {"input_tokens": 12, "output_tokens": 6}
}

Building Multi-Turn Conversations

Since the Messages API is stateless, you must send the entire conversation history with each request. This gives you full control over the context.

Python Example

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)

print(message.content[0].text)

Key Points

  • You control the history: Earlier turns don't need to come from Claude. You can inject synthetic assistant messages (e.g., from a database or previous session).
  • Roles matter: Use "user" for human messages and "assistant" for Claude's responses.
  • No hidden state: Each request is independent. If you want Claude to remember something, you must include it in the messages array.

Putting Words in Claude's Mouth: Prefill

Prefill lets you start Claude's response for it. You include an assistant message at the end of your input, and Claude continues from there. This is powerful for:

  • Constraining output format (e.g., forcing JSON or multiple choice answers)
  • Guiding tone or style
  • Reducing tokens by starting the response yourself

Example: Multiple Choice Answer

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

print(message.content[0].text) # Outputs: "C"

Important Limitations

  • Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. These models return a 400 error.
  • Alternative: Use structured outputs or system prompt instructions for those models.
  • Use case: Best for simple constraints like multiple choice or short completions.

Working with Images (Vision)

Claude can analyze images sent through the Messages API. You include image content blocks in the user message.

Python Example

import base64

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "What does this chart show?" } ] } ] )

print(message.content[0].text)

Supported Formats

  • JPEG, PNG, GIF, WebP
  • Maximum size: ~100MB (but larger images use more tokens)
  • Best practice: Resize images to under 1MB for faster processing

Handling Stop Reasons

Claude can stop generating for several reasons. Understanding these helps you handle responses correctly:

stop_reasonMeaningWhat to do
"end_turn"Claude finished naturallyReturn the response
"max_tokens"Hit token limitIncrease max_tokens or continue the conversation
"stop_sequence"Hit a custom stop sequenceHandle based on your logic
"tool_use"Claude wants to call a toolExecute the tool and return results

Best Practices

1. Manage Token Usage

  • Use max_tokens to control response length
  • Monitor usage.input_tokens and usage.output_tokens for cost tracking
  • Consider prompt caching for repeated system prompts

2. Handle Errors Gracefully

  • Always wrap API calls in try/catch blocks
  • Handle rate limits with exponential backoff
  • Check for 400 errors (invalid requests) and 429 errors (rate limited)

3. Optimize for Latency

  • Use streaming for real-time applications
  • Keep conversation history concise (trim old messages if needed)
  • Use the smallest model that meets your needs

4. Security Considerations

  • The Messages API supports Zero Data Retention (ZDR). When enabled, Anthropic doesn't store your data after the response is returned.
  • Never send sensitive information in prompts unless you have a ZDR agreement.

Common Patterns

Pattern 1: Chatbot with Memory

Store conversation history in a database and send it with each request:

def chat_with_claude(history, user_message):
    history.append({"role": "user", "content": user_message})
    
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        messages=history
    )
    
    history.append({"role": "assistant", "content": response.content[0].text})
    return response.content[0].text, history

Pattern 2: Structured Output with System Prompt

For models that don't support prefill, use system prompts:

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system="You are a helpful assistant. Always respond in JSON format.",
    messages=[
        {"role": "user", "content": "List three fruits."}
    ]
)

Next Steps

Now that you understand the Messages API basics, explore these advanced topics:

  • Streaming: Get responses token-by-token for real-time UX
  • Tool Use: Let Claude call functions and APIs
  • Prompt Caching: Reduce costs for repeated prompts
  • Batch Processing: Send multiple requests efficiently

Key Takeaways

  • The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over context.
  • Multi-turn conversations are built by appending user and assistant messages to an array. You can inject synthetic assistant messages.
  • Prefill lets you start Claude's response, but it's not supported on all models. Use structured outputs or system prompts as alternatives.
  • Vision capabilities allow Claude to analyze images sent as base64-encoded content blocks.
  • Always handle stop reasons (end_turn, max_tokens, stop_sequence, tool_use) to build robust applications.