BeClaude
GuideBeginnerAPI2026-05-20

Mastering the Messages API: A Practical Guide to Building with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities, with practical Python and TypeScript code examples.

Messages APIClaude APIMulti-turn conversationsPrefillVision

Mastering the Messages API: A Practical Guide to Building with Claude

Claude's Messages API is the primary way to interact with Anthropic's language models programmatically. Whether you're building a chatbot, an agent, or an automation tool, understanding how to structure requests and handle responses is essential. This guide covers everything from basic API calls to advanced patterns like multi-turn conversations, prefill techniques, and vision capabilities.

Understanding the Messages API vs. Managed Agents

Before diving into code, it's important to understand the two main ways to build with Claude:

  • Messages API: Direct model prompting access. Best for custom agent loops and fine-grained control. You manage the conversation state and logic yourself.
  • Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
This guide focuses on the Messages API, which gives you maximum flexibility.

Making Your First API Request

Let's start with a simple request. The Messages API expects a model, max_tokens, and an array of messages with alternating user and assistant roles.

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const message = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 1024, messages: [ { role: 'user', content: 'Hello, Claude' } ] });

console.log(message);

Understanding the Response

The API returns a structured response:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    { "type": "text", "text": "Hello!" }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

  • content: An array of content blocks (usually text, but can include tool calls or images).
  • stop_reason: Why the response ended (end_turn, max_tokens, stop_sequence, or tool_use).
  • usage: Token counts for billing and context management.

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage state on your end.

Example: Two-Turn Conversation

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)
Important: The conversation history doesn't need to be real. You can inject synthetic assistant messages to guide Claude's behavior or provide context from external systems.

Managing Conversation State

In production, you'll want to store conversation history in a database or cache:

conversation = [
    {"role": "user", "content": "Hello, Claude"},
    {"role": "assistant", "content": "Hello!"}
]

Later...

conversation.append({"role": "user", "content": "What's the weather?"})

response = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=conversation )

conversation.append({"role": "assistant", "content": response.content[0].text})

Prefilling Claude's Response

One powerful technique is prefilling—putting words in Claude's mouth by including an assistant message at the end of your input. This shapes the response and can enforce specific formats.

Example: Multiple Choice Answer

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

print(message.content[0].text) # Output: "C"

By setting max_tokens=1, Claude only generates the next token—the letter "C". This is perfect for classification tasks or structured outputs.

Prefill Limitations

Prefill is not supported on the following models:

  • Claude Mythos Preview
  • Claude Opus 4.7
  • Claude Opus 4.6
  • Claude Sonnet 4.6
For these models, use structured outputs or system prompt instructions instead. The API returns a 400 error if you attempt prefill with unsupported models.

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications:

Stop ReasonMeaning
end_turnClaude finished naturally
max_tokensResponse was cut off due to token limit
stop_sequenceA custom stop sequence was encountered
tool_useClaude wants to call a tool

Example: Handling max_tokens

If Claude stops due to max_tokens, you can continue the conversation by sending the partial response back:

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=100,
    messages=[
        {"role": "user", "content": "Write a long story"}
    ]
)

if response.stop_reason == "max_tokens": # Continue from where Claude left off continuation = client.messages.create( model="claude-opus-4-7", max_tokens=100, messages=[ {"role": "user", "content": "Write a long story"}, {"role": "assistant", "content": response.content[0].text}, {"role": "user", "content": "Please continue"} ] )

Working with Images (Vision)

The Messages API supports image inputs. You can send images as base64-encoded data or URLs.

Sending an Image

import base64

with open("photo.jpg", "rb") as f: image_data = base64.b64encode(f.read()).decode()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": image_data } }, { "type": "text", "text": "What's in this image?" } ] } ] )

print(message.content[0].text)

Supported media types: image/jpeg, image/png, image/gif, image/webp.

Best Practices

  • Manage context windows: Keep conversation history within Claude's context window. Use techniques like summarization or sliding windows for long conversations.
  • Use system prompts: For persistent instructions, use the system parameter instead of repeating instructions in every user message.
  • Handle errors gracefully: Implement retry logic for rate limits and network errors.
  • Monitor token usage: Track usage.input_tokens and usage.output_tokens to optimize costs.
  • Stream responses: For real-time applications, use streaming to get partial responses as they're generated.

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill techniques, and vision capabilities, you can create powerful applications that leverage Claude's intelligence. Remember that the API is stateless—you control the conversation flow, which gives you maximum flexibility but also requires careful state management.

Key Takeaways

  • The Messages API is stateless—you must send the full conversation history with every request.
  • Prefill allows you to shape Claude's responses by providing partial assistant messages, but it's not supported on all models.
  • Use stop_reason to handle different response endings, especially max_tokens for truncated responses.
  • The API supports multimodal inputs, including images (base64 or URL) alongside text.
  • Always monitor token usage and manage context windows to optimize performance and costs.