BeClaude
Guide2026-04-30

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Learn how to use the Claude Messages API for basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide covers the Claude Messages API, including how to send basic requests, build multi-turn conversations, prefill Claude's responses, and use vision capabilities with practical Python and TypeScript examples.

Messages APIClaude APIMulti-Turn ConversationsPrefillVision

Mastering the Messages API: Build Multi-Turn Conversations with Claude

Claude's Messages API is the primary way to interact with the model programmatically. Whether you're building a chatbot, an agent, or a content generation tool, understanding how to structure requests and handle responses is essential. This guide walks you through the core patterns—from simple requests to advanced techniques like prefilling and vision.

Understanding the Messages API vs. Managed Agents

Anthropic offers two paths for building with Claude:

  • Messages API: Direct access to the model. You control every aspect of the conversation loop. Best for custom agents, fine-grained control, and real-time interactions.
  • Claude Managed Agents: A pre-built, configurable agent harness that runs on managed infrastructure. Ideal for long-running, asynchronous tasks.
This guide focuses on the Messages API, which gives you the most flexibility.

Basic Request and Response

At its simplest, you send a list of messages to Claude and receive a response. Here's a minimal example using the Python SDK:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

Response:
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

  • content: An array of content blocks. Each block has a type (e.g., text) and the actual content.
  • stop_reason: Why the response ended. Common values: "end_turn" (Claude finished naturally), "max_tokens" (hit the token limit), "stop_sequence" (encountered a custom stop sequence).
  • usage: Token counts for billing and debugging.

Building Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with each request. This gives you complete control over context.

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": "Hello!"}, {"role": "user", "content": "Can you describe LLMs to me?"}, ] )

print(message.content[0].text)

Important: Earlier turns don't have to come from Claude. You can inject synthetic assistant messages to guide the conversation or provide context. This is useful for:
  • Simulating a persona: Pre-fill a character's backstory.
  • Providing examples: Show Claude how you want it to respond.
  • Correcting course: Insert a corrected assistant message to steer the conversation.

Putting Words in Claude's Mouth (Prefilling)

One of the most powerful techniques is prefilling—you start Claude's response in the last position of the messages array. This shapes the output, enforces structure, or constrains answers.

Example: Multiple Choice Answer

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # Output: "C"

By setting max_tokens=1 and prefilling "The answer is (", you force Claude to complete with a single character—perfect for classification tasks.

Use Cases for Prefilling

  • Structured output: Start with {"name": " to get JSON-like responses.
  • Roleplay: Begin with a character's dialogue to set tone.
  • Code generation: Prefill with def to get a function definition.
  • Chain-of-thought: Start with "Let's think step by step:" to encourage reasoning.

Working with Vision (Image Input)

Claude can analyze images when you include them in the content array. You provide images as base64-encoded data or via a URL.

import anthropic
import base64

client = anthropic.Anthropic()

Read and encode image

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] )

print(message.content[0].text)

Supported media types: image/jpeg, image/png, image/gif, image/webp. Tips for vision:
  • Keep images under 20MB.
  • Use clear, high-resolution images for best results.
  • Combine with text instructions for precise analysis.

Handling Stop Reasons

Always check the stop_reason field to understand why Claude stopped:

Stop ReasonMeaningAction
end_turnClaude finished naturallyContinue or end conversation
max_tokensHit token limitIncrease max_tokens or truncate history
stop_sequenceEncountered custom stop sequenceHandle as needed
tool_useClaude wants to call a toolProcess tool call and continue
if message.stop_reason == "max_tokens":
    print("Response was truncated. Consider increasing max_tokens.")
elif message.stop_reason == "end_turn":
    print("Claude finished naturally.")

Streaming Responses

For real-time applications, use streaming to receive tokens as they're generated:

import anthropic

client = anthropic.Anthropic()

with client.messages.stream( model="claude-opus-4-7", max_tokens=1024, messages=[{"role": "user", "content": "Tell me a story"}] ) as stream: for text in stream.text_stream: print(text, end="", flush=True)

Streaming is essential for chatbots and any UI where you want to show progress.

Best Practices

  • Manage context windows: Keep conversation history within Claude's context limit (200K tokens for most models). Use prompt caching for repetitive prefixes.
  • Use system prompts: For persistent instructions, use the system parameter instead of repeating in every user message.
  • Handle errors gracefully: Implement retries with exponential backoff for rate limits and server errors.
  • Monitor token usage: Track usage.input_tokens and usage.output_tokens to optimize costs.
  • Prefill strategically: Use prefilling to enforce output format, but avoid over-constraining Claude's creativity.

Key Takeaways

  • The Messages API is stateless—send full conversation history with each request for multi-turn interactions.
  • Prefilling lets you shape Claude's response by starting its output, useful for structured data and constrained tasks.
  • Vision support allows Claude to analyze images via base64 or URL, enabling multimodal applications.
  • Always check stop_reason to handle truncation, tool calls, or natural endings appropriately.
  • Streaming is crucial for responsive UIs; use the SDK's streaming methods for real-time token delivery.
With these patterns, you can build anything from simple Q&A bots to complex, multi-step agents using Claude's Messages API.