BeClaude
GuideBeginnerBest Practices2026-05-22

Mastering the Messages API: Build Conversational AI with Claude

Learn how to use Claude's Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities. Practical guide with code examples.

Quick Answer

This guide covers everything you need to build with Claude's Messages API: making basic requests, managing multi-turn conversations, using prefill to shape responses, and working with images. You'll get practical code examples in Python and TypeScript.

Messages APIClaude APIConversational AIPrompt EngineeringMultimodal

Introduction

The Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding how to structure your API calls is essential. This guide walks you through the core patterns—from simple requests to advanced techniques like prefill and vision—so you can build robust conversational applications.

Understanding the Messages API vs. Managed Agents

Anthropic offers two paths for building with Claude:

  • Messages API: Direct model access. You control the entire conversation loop, manage state, and handle tool calls yourself. Best for custom agent loops and fine-grained control.
  • Claude Managed Agents: A pre-built, configurable agent harness that runs in managed infrastructure. Best for long-running tasks and asynchronous work.
This guide focuses on the Messages API, which gives you maximum flexibility.

Making Your First API Request

Let's start with the simplest possible request: sending a single message and getting a response.

Python Example

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

TypeScript Example

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const message = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 1024, messages: [ { role: 'user', content: 'Hello, Claude' } ] });

console.log(message);

Understanding the Response

The API returns a structured JSON object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

  • content: An array of content blocks (text, tool_use, etc.)
  • stop_reason: Why the response ended (end_turn, max_tokens, stop_sequence, tool_use)
  • usage: Token counts for billing and context management

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context.

Python Example

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"},
        {"role": "assistant", "content": "Hello!"},
        {"role": "user", "content": "Can you describe LLMs to me?"}
    ]
)

print(message.content[0].text)

Important Notes

  • You don't need to use only real assistant responses. You can synthesize assistant messages to guide the conversation.
  • Always alternate between user and assistant roles. Two consecutive user messages will cause an error.
  • The conversation history counts toward your input token limit, so be mindful of context window constraints.

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response for it. This is useful for:

  • Forcing structured outputs (e.g., JSON, multiple choice)
  • Steering the tone or format of the response
  • Reducing token usage by constraining the output

Example: Multiple Choice Answer

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1,
    messages=[
        {
            "role": "user",
            "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae"
        },
        {
            "role": "assistant",
            "content": "The answer is ("
        }
    ]
)

print(message.content[0].text) # Output: "C"

Prefill Limitations

Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. Requests using prefill with these models return a 400 error. Use structured outputs or system prompt instructions instead.

For models that don't support prefill, consider:

  • Using the system parameter with formatting instructions
  • Implementing structured outputs (JSON mode)
  • Post-processing the response

Working with Images (Vision)

The Messages API supports image inputs for multimodal understanding. You can pass images as base64-encoded data or as URLs.

Python Example

import anthropic
import base64

client = anthropic.Anthropic()

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] )

print(message.content[0].text)

Supported Image Types

  • JPEG, PNG, GIF, WebP
  • Maximum size: 100 MB per image
  • Optimal resolution: 1568x1568 pixels (larger images are downscaled)

Handling Stop Reasons

Understanding why Claude stopped generating helps you build robust applications:

Stop ReasonMeaningAction
end_turnClaude finished naturallyContinue or end conversation
max_tokensHit the token limitIncrease max_tokens or continue
stop_sequenceFound a custom stop sequenceHandle as needed
tool_useClaude wants to call a toolExecute tool and return result

Example: Handling Tool Calls

if message.stop_reason == "tool_use":
    for block in message.content:
        if block.type == "tool_use":
            tool_name = block.name
            tool_input = block.input
            # Execute your tool logic here
            print(f"Claude wants to call {tool_name} with {tool_input}")

Best Practices

1. Manage Context Window

  • Keep conversation history concise. Summarize or prune old messages when approaching token limits.
  • Use prompt caching for repeated system instructions (see Prompt Caching docs).

2. Handle Errors Gracefully

  • Always catch API errors (rate limits, authentication, invalid requests).
  • Implement exponential backoff for retries.

3. Use System Messages

For persistent instructions, use the system parameter instead of repeating instructions in user messages:

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system="You are a helpful assistant that always responds in JSON format.",
    messages=[
        {"role": "user", "content": "List three planets."}
    ]
)

4. Streaming for Responsiveness

For real-time applications, use streaming to show partial responses as they're generated:

with client.messages.stream(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Conclusion

The Messages API is the foundation for building with Claude. By mastering basic requests, multi-turn conversations, prefill, and vision capabilities, you can create sophisticated AI applications. Remember that the API is stateless—you control the context—and always handle stop reasons appropriately for robust applications.

Key Takeaways

  • Stateless design: You must send the full conversation history with every request, giving you complete control over context.
  • Prefill shapes responses: Start Claude's response to enforce structure, but check model compatibility.
  • Vision is built-in: Pass images as base64 or URLs for multimodal understanding.
  • Handle stop reasons: end_turn, max_tokens, and tool_use each require different handling logic.
  • Stream for UX: Use streaming to improve perceived responsiveness in user-facing applications.