BeClaude
GuideBeginnerAPI2026-05-22

Mastering the Messages API: A Practical Guide to Building with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API to send basic requests, build multi-turn conversations, prefill Claude's responses, and work with images. You'll get practical Python and TypeScript examples for each pattern.

Messages APIClaude APImulti-turn conversationsprefillvision

Mastering the Messages API: A Practical Guide to Building with Claude

Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent, understanding the Messages API is essential. This guide walks you through the most common patterns—from basic requests to advanced techniques like prefill and vision—with practical code examples you can use today.

What Is the Messages API?

The Messages API gives you direct access to Claude's language model. You send a list of messages (your conversation history) and receive Claude's response. It's stateless, meaning you manage the conversation context yourself by sending the full history with each request.

Anthropic offers two ways to build with Claude:

  • Messages API: Direct model access, best for custom agent loops and fine-grained control.
  • Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure, ideal for long-running tasks.
This guide focuses on the Messages API.

Basic Request and Response

Let's start with the simplest possible interaction: sending a single message and getting a reply.

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const message = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 1024, messages: [ { role: 'user', content: 'Hello, Claude' } ] });

console.log(message);

Understanding the Response

The API returns a structured response object. Here's what you get:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields:

  • content: An array of content blocks (usually text, but can include tool use blocks).
  • stop_reason: Why Claude stopped generating. Common values are "end_turn" (Claude finished naturally) and "max_tokens" (hit the token limit).
  • usage: Token counts for billing and monitoring.

Building Multi-Turn Conversations

Because the Messages API is stateless, you must send the entire conversation history with each request. This gives you full control over context.

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": "Hello!"}, {"role": "user", "content": "Can you describe LLMs to me?"} ] )

print(message.content[0].text)

Important Notes

  • You control the history: Earlier turns don't need to come from Claude. You can inject synthetic assistant messages (e.g., from a database or previous session).
  • Order matters: Messages must alternate between user and assistant roles, starting with user.
  • Context window: Be mindful of the total token count. Claude's context window varies by model (typically 200K tokens).

Practical Tip: Managing Conversation State

In a real application, you'll store messages in a list and append new ones as the conversation progresses:

conversation = [
    {"role": "user", "content": "Hello, Claude"}
]

First turn

response = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=conversation )

Add Claude's response to history

conversation.append({"role": "assistant", "content": response.content[0].text})

Add user's next message

conversation.append({"role": "user", "content": "Tell me more about yourself."})

Second turn

response = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=conversation )

Prefilling Claude's Response

Prefilling lets you start Claude's response for it. You place an assistant message with partial content at the end of your messages array, and Claude continues from there.

Use Case: Multiple Choice Questions

This pattern is great for getting structured, constrained outputs:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # Outputs: "C"

Important Limitations

  • Not supported on all models: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests with these models return a 400 error.
  • Alternatives: For unsupported models, use structured outputs or system prompt instructions instead.
  • Token limit: Set max_tokens appropriately. In the example above, max_tokens=1 ensures Claude only outputs the letter.

Other Prefill Patterns

  • JSON completion: Prefill with {"response": to get structured JSON.
  • Sentence completion: Prefill with "In summary," to guide Claude toward a conclusion.
  • Role playing: Prefill with "As a helpful assistant, I would say:" to reinforce persona.

Working with Images (Vision)

Claude can analyze images. You include image content blocks in your user messages.

Python Example

import anthropic
import base64

client = anthropic.Anthropic()

Read and encode the image

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] )

print(message.content[0].text)

Supported Image Formats

  • JPEG
  • PNG
  • GIF (first frame only)
  • WebP

Tips for Vision Requests

  • Use base64 encoding: The API accepts base64-encoded image data.
  • Combine with text: Always include a text prompt alongside your image to tell Claude what to do.
  • Resolution matters: Higher resolution images use more tokens. For simple tasks, consider resizing images.
  • Token cost: Images are tokenized based on size and resolution. Check the usage field in the response to monitor costs.

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications.

stop_reasonMeaningAction
end_turnClaude finished naturallyContinue conversation or end
max_tokensHit the token limitIncrease max_tokens or split response
stop_sequenceFound a stop sequenceHandle based on your logic
tool_useClaude wants to use a toolExecute the tool and return results

Example: Handling max_tokens

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=100,
    messages=[{"role": "user", "content": "Write a long essay on AI."}]
)

if response.stop_reason == "max_tokens": print("Response was truncated. Consider increasing max_tokens.")

Best Practices

  • Manage context window: Keep conversation history within the model's context limit. Use techniques like summarization for long conversations.
  • Use system prompts: For persistent instructions, use the system parameter instead of repeating instructions in every user message.
  • Monitor token usage: Track usage.input_tokens and usage.output_tokens to control costs.
  • Handle errors gracefully: Implement retry logic for transient errors and check for 400 errors on invalid requests.
  • Stream responses: For real-time applications, use streaming to get tokens as they're generated.

Key Takeaways

  • The Messages API is stateless—you must send the full conversation history with each request.
  • Multi-turn conversations are built by maintaining a list of alternating user and assistant messages.
  • Prefilling lets you guide Claude's response by providing a partial assistant message, but check model compatibility.
  • Vision capabilities allow Claude to analyze images by including base64-encoded image content blocks.
  • Always check stop_reason and usage in the response to handle truncation and monitor costs effectively.