BeClaude
GuideBeginnerAPI2026-05-15

Mastering the Claude Messages API: From Basic Requests to Advanced Patterns

Learn how to use the Claude Messages API effectively with practical examples covering basic requests, multi-turn conversations, prefill techniques, and vision capabilities.

Quick Answer

This guide teaches you how to work with the Claude Messages API, including making basic requests, building multi-turn conversations, using prefill to shape responses, and sending images for vision tasks.

Messages APIClaude APImultiturn conversationsprefillvision

Introduction

The Claude Messages API is the primary way to interact with Claude programmatically. Whether you're building a chatbot, a content generation tool, or a complex agent system, understanding how to structure your API calls is essential. This guide walks you through the core patterns for working with the Messages API, from simple requests to advanced techniques like prefill and vision.

Basic Request and Response

At its simplest, the Messages API accepts a list of messages and returns a response. Here's a minimal example using Python:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"} ] )

print(message)

The response includes:

  • id: A unique identifier for the message
  • role: Always "assistant" for responses
  • content: An array of content blocks (usually text)
  • model: The model used
  • stop_reason: Why the generation stopped (e.g., "end_turn", "max_tokens")
  • usage: Token counts for input and output
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Hello!"}],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {"input_tokens": 12, "output_tokens": 6}
}

Multi-Turn Conversations

The Messages API is stateless — you must send the full conversation history with every request. This gives you complete control over context and allows you to build dynamic conversations over time.

Building a Conversation

To continue a conversation, simply append new messages to the history:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude"}, {"role": "assistant", "content": "Hello!"}, {"role": "user", "content": "Can you describe LLMs to me?"} ] )

print(message.content[0].text)

Synthetic Assistant Messages

You don't have to use only real Claude responses. You can inject synthetic assistant messages to guide the conversation or simulate context. For example, you might pre-populate a conversation with a system-like assistant response:

messages = [
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What is its population?"}
]

This is useful for:

  • Providing context from previous sessions
  • Simulating a specific assistant persona
  • Building few-shot examples into the conversation

Prefill: Putting Words in Claude's Mouth

Prefill allows you to start Claude's response by providing the beginning of its answer. This is powerful for:

  • Constraining responses to specific formats
  • Guiding the model toward a particular structure
  • Getting single-word or single-token answers

Basic Prefill Example

Here's how to use prefill to get a multiple-choice answer:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1, messages=[ { "role": "user", "content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae" }, { "role": "assistant", "content": "The answer is (" } ] )

print(message.content[0].text) # Output: "C"

Important Prefill Limitations

  • Not supported on: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6
  • These models return a 400 error if you attempt prefill
  • Alternative: Use structured outputs or system prompt instructions instead

When to Use Prefill

  • Classification tasks: Force Claude to output a specific label
  • JSON extraction: Start with {" to ensure valid JSON output
  • Format control: Begin a list or table structure
  • Single-token answers: Combine with max_tokens=1 for constrained responses

Vision: Sending Images to Claude

Claude can analyze images sent via the Messages API. This is useful for:

  • Document analysis
  • Image description
  • Visual question answering

Image Request Format

Images are sent as content blocks with a source object:

import anthropic
import base64

client = anthropic.Anthropic()

Read and encode the image

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8")

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "Describe this chart in detail." } ] } ] )

print(message.content[0].text)

Supported Media Types

  • image/jpeg
  • image/png
  • image/gif (first frame only)
  • image/webp

Tips for Vision Requests

  • Combine with text: Always include a text prompt alongside the image for best results
  • Image size: Larger images consume more tokens; resize if needed
  • Multiple images: You can send multiple images in a single message

Handling Stop Reasons

Every response includes a stop_reason field that tells you why generation stopped:

Stop ReasonMeaning
end_turnClaude finished naturally
max_tokensHit the token limit; response may be truncated
stop_sequenceA custom stop sequence was encountered
tool_useClaude wants to use a tool (for agent workflows)
For max_tokens, you should continue the conversation by sending the partial response back and asking Claude to continue.

Best Practices

1. Manage Token Usage

  • Monitor usage.input_tokens and usage.output_tokens to control costs
  • Use max_tokens to limit response length
  • Consider prompt caching for repeated system prompts

2. Handle Errors Gracefully

  • Implement retry logic with exponential backoff
  • Check for 400 errors (invalid requests) and 429 errors (rate limits)
  • Validate your message structure before sending

3. Optimize for Your Use Case

  • Chatbots: Use multi-turn patterns with full history
  • Classification: Use prefill with max_tokens=1
  • Content generation: Use system prompts and longer max_tokens
  • Vision tasks: Combine images with clear text instructions

4. Security Considerations

  • Never expose API keys in client-side code
  • Validate and sanitize user input before sending to the API
  • Be aware of data retention policies (ZDR available for eligible organizations)

Conclusion

The Claude Messages API is flexible and powerful. By mastering basic requests, multi-turn conversations, prefill, and vision, you can build sophisticated applications that leverage Claude's capabilities. Remember that the API is stateless — you control the context by managing the conversation history yourself.

Key Takeaways

  • The Messages API is stateless — always send the full conversation history with each request
  • Use prefill to guide Claude's responses, but check model compatibility (not supported on Opus 4.7 and later)
  • Vision capabilities allow you to send images alongside text prompts for analysis
  • Monitor stop_reason to handle truncated responses and tool use scenarios
  • Synthetic assistant messages give you full control over conversation context and few-shot examples