BeClaude
GuideBeginnerAPI2026-05-14

Mastering the Claude Messages API: From Basic Requests to Advanced Patterns

Learn how to use the Claude Messages API effectively—covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities with practical code examples.

Quick Answer

This guide teaches you how to work with the Claude Messages API, including stateless multi-turn conversations, prefill techniques to shape responses, and vision capabilities for image analysis.

Messages APIClaudeAPI patternsprefillvision

Introduction

Anthropic offers two primary ways to build with Claude: the Messages API for direct model access and Claude Managed Agents for pre-built, configurable agent harnesses. This guide focuses on the Messages API, giving you fine-grained control over your interactions with Claude.

Whether you're building a chatbot, an analysis tool, or a complex agent loop, understanding the Messages API patterns is essential. Let's dive into the core concepts and practical patterns you'll use every day.

Basic Request and Response

At its simplest, a Messages API call sends a user message and receives Claude's response. Here's the canonical example in Python:

import anthropic

client = anthropic.Anthropic() message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] ) print(message)

Response:
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

Key fields to note:

  • content: An array of content blocks (usually text).
  • stop_reason: Why Claude stopped—"end_turn" means the model finished naturally.
  • usage: Token counts for billing and optimization.

Multi-Turn Conversations

The Messages API is stateless—you must send the full conversation history with every request. This gives you complete control over context but requires you to manage the history yourself.

Building a Conversation

import anthropic

client = anthropic.Anthropic() message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": "Hello!"}, {"role": "user", "content": "Can you describe LLMs to me?"} ] ) print(message)

Synthetic Assistant Messages

You can inject synthetic assistant messages—they don't have to come from Claude. This is useful for:

  • Providing examples (few-shot prompting)
  • Guiding conversation flow
  • Simulating multi-step reasoning
messages = [
    {"role": "user", "content": "What is 2+2?"},
    {"role": "assistant", "content": "4"},  # synthetic example
    {"role": "user", "content": "What is 5+3?"}
]

Prefill: Putting Words in Claude's Mouth

Prefilling lets you start Claude's response by providing the beginning of its answer. This is powerful for:

  • Constraining output format (e.g., JSON, multiple choice)
  • Setting tone or style
  • Reducing token waste

Example: Multiple Choice with Single Token

import anthropic

client = anthropic.Anthropic() message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] ) print(message)

Response:
{
  "content": [{"type": "text", "text": "C"}],
  "stop_reason": "max_tokens"
}
Important: Prefill is not supported on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, or Claude Sonnet 4.6. Use structured outputs or system prompt instructions instead.

Prefill for JSON Output

messages = [
    {"role": "user", "content": "Extract the name and age from: John is 30 years old."},
    {"role": "assistant", "content": "{\"name\": \""}
]

This forces Claude to start with a JSON object, making parsing more reliable.

Vision Capabilities

The Messages API supports image inputs, enabling Claude to analyze visual content. You can pass images as base64-encoded data or as URLs.

Example: Image Analysis

import anthropic
import base64

client = anthropic.Anthropic()

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] ) print(message.content[0].text)

Supported media types: image/jpeg, image/png, image/gif, image/webp.

Handling Stop Reasons

Understanding stop_reason helps you build robust applications:

Stop ReasonMeaningAction
end_turnClaude finished naturallyContinue or end conversation
max_tokensHit token limitIncrease max_tokens or continue
stop_sequenceCustom stop sequence triggeredHandle as designed
tool_useClaude wants to call a toolExecute tool and return result

Streaming Responses

For real-time applications, use streaming to receive tokens as they're generated:

import anthropic

client = anthropic.Anthropic()

with client.messages.stream( model="claude-opus-4-7", max_tokens=1024, messages=[{"role": "user", "content": "Tell me a story"}] ) as stream: for text in stream.text_stream: print(text, end="", flush=True)

Best Practices

  • Manage context windows: Keep conversation history within Claude's context limit (200K tokens for most models).
  • Use prompt caching: For repeated system prompts or large contexts, enable prompt caching to reduce costs and latency.
  • Handle errors gracefully: Implement retry logic for rate limits and network issues.
  • Monitor token usage: Track usage.input_tokens and usage.output_tokens to optimize your prompts.
  • Use structured outputs: For reliable JSON parsing, prefer structured outputs over prefill when possible.

Key Takeaways

  • The Messages API is stateless—always send the full conversation history with each request.
  • Prefill lets you shape Claude's responses by providing the beginning of its answer, but check model compatibility.
  • Vision capabilities allow image analysis via base64 or URL inputs.
  • Streaming enables real-time token-by-token output for better user experience.
  • Monitor stop reasons and token usage to build robust, cost-effective applications.