BeClaude
GuideIntermediateAPI2026-05-15

Claude Vision API: Complete Guide to Image and Multimodal Input

Learn how to use Claude's Vision capabilities to analyze images, extract data from PDFs, process screenshots, and build multimodal AI applications with the Claude API.

Quick Answer

Claude Vision allows you to send images (PNG, JPEG, WEBP, GIF) alongside text prompts via the API. Images are transmitted as base64-encoded data or URL references. Claude can analyze charts, extract text from documents, describe photos, and answer questions about visual content. Image processing costs vary by model — approximately 1,600 tokens per image for Sonnet 4 and Opus 4.6.

visionmultimodalimage-analysisapipdf

What is Claude Vision?

Claude Vision is Claude's multimodal capability that allows it to process and analyze images alongside text. Unlike text-only models, Claude can look at photographs, diagrams, charts, screenshots, PDFs, and handwritten notes — then answer questions about them, extract information, or take actions based on what it sees.

This capability transforms Claude from a language model into a multimodal assistant that can help with tasks ranging from document analysis to UI testing to scientific figure interpretation.

Supported Image Formats

Claude supports these image formats as input:

FormatMIME TypeMax ResolutionUse Case
PNGimage/png8,000 x 8,000 pxScreenshots, diagrams, documents
JPEGimage/jpeg8,000 x 8,000 pxPhotos, scanned documents
WEBPimage/webp8,000 x 8,000 pxWeb images, optimized photos
GIFimage/gif8,000 x 8,000 pxSimple animations (static frame)
Important limits:
  • Maximum file size: 100 MB per image (after base64 encoding: ~137 MB)
  • For best results, keep images under 20 MB
  • Very large images are automatically resized — Claude processes at 1,600 x 1,600 px internally

Getting Started with Image Analysis

Using the Claude API

import anthropic
import base64

client = anthropic.Anthropic()

with open("chart.png", "rb") as f: image_data = base64.b64encode(f.read()).decode()

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{ "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data, }, }, { "type": "text", "text": "Describe this chart in detail. What are the key trends?" } ], }], )

print(response.content[0].text)

Using Image URLs

You can also reference images by URL:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "url",
                    "url": "https://example.com/dashboard-screenshot.png"
                },
            },
            {
                "type": "text",
                "text": "What metrics are shown on this dashboard?"
            }
        ],
    }],
)
URL requirements:
  • Must be publicly accessible (no authentication)
  • Must use HTTPS
  • Response must include Content-Type header with the image MIME type
  • Image must be served over a stable connection with reasonable latency

Practical Vision Use Cases

1. Document and PDF Analysis

Claude Vision excels at extracting structured data from documents:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": invoice_image}},
            {"type": "text", "text": """Extract the following fields from this invoice:
  • Invoice number
  • Date
  • Vendor name
  • Line items (description, quantity, unit price, total)
  • Subtotal, tax, grand total
  • Payment terms
Format as JSON."""} ] }], )

2. Chart and Data Visualization Analysis

Perfect for analyzing business dashboards, scientific figures, and financial charts:

Extract the key data points from this line chart:
  • What is the trend for each series?
  • Identify any anomalies or outliers
  • What is the approximate value at each labeled point?

3. UI/UX Review and Testing

Claude can review screenshots of your application:

Review this UI screenshot for:
  • Visual alignment issues
  • Missing or inconsistent elements
  • Accessibility concerns (color contrast, font sizes)
  • Layout problems at this viewport size

4. Handwriting Recognition

Claude can read handwritten notes and forms:

Transcribe the handwritten text in this image.
Preserve the original formatting and layout where possible.
Note any words you're uncertain about with [brackets].

Image Processing Costs

Vision requests are priced based on image size. Each image consumes tokens proportional to its dimensions:

Image SizeApproximate Token Cost (Sonnet 4 / Opus 4.6)
Small (< 500x500)~400 tokens
Medium (1000x1000)~1,000 tokens
Large (2000x2000+)~1,600 tokens
Max (8000x8000)~1,600 tokens (auto-resized)
Pricing example with Sonnet 4 ($3/M input tokens):
  • One medium image (1,000 tokens) + 500 text tokens = 1,500 input tokens
  • Cost per request: ~$0.0045
Compare model pricing on our Claude API Pricing Guide and use the Pricing Calculator to estimate your costs.

Best Practices for Vision Prompts

1. Be Specific About What to Look At

Instead of: "What's in this image?" Try: "Look at the table in the bottom-right section of this dashboard. What are the top 3 rows by revenue?"

2. Provide Context

Give Claude context about what the image represents:

This is a screenshot of a customer support dashboard taken at 3:00 PM on a Monday.
The data shown is for the past 24 hours.

3. Request Structured Output

For extraction tasks, always specify the output format:

Extract the data from this table and format it as a markdown table.
Then provide a summary of the key insights in bullet points.

4. Use Multiple Images

You can send multiple images in a single request to compare or combine information:

messages=[{
    "role": "user",
    "content": [
        {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": before_image}},
        {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": after_image}},
        {"type": "text", "text": "Compare these two screenshots and list all the changes you notice."}
    ]
}]

Vision with Claude Code

Claude Code also supports image input. You can drag and drop images directly into your terminal session:
# Claude will analyze the image
claude "Analyze this UI mockup and generate React code for it" -i mockup.png

This is particularly useful for:

  • Generating code from design mockups
  • Debugging UI issues by sharing screenshots
  • Converting diagrams into code implementations

Model Comparison for Vision Tasks

CapabilityOpus 4.6Sonnet 4Haiku
Image understandingBestExcellentGood
Text extractionExcellentExcellentGood
Chart analysisBestExcellentFair
HandwritingExcellentVery GoodFair
SpeedSlowestFastFastest
Cost per image~$0.024~$0.0048~$0.0004

Common Issues and Troubleshooting

Image Quality Issues

  • Blurry text: Ensure minimum 300 DPI for scanned documents
  • Small text: Claude works best with text at least 10px tall in screenshots
  • Low contrast: High contrast images produce better results

Error Handling

try:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[...]
    )
except anthropic.BadRequestError as e:
    if "image" in str(e).lower():
        print("Check image format, size, or encoding")
    raise

Key Takeaways

  • Claude Vision supports PNG, JPEG, WEBP, and GIF formats through base64 encoding or URL references
  • Be specific about what you want Claude to analyze in the image — don't rely on vague instructions
  • Image processing costs ~1,600 tokens max per image, making it economical for most use cases
  • Multiple images can be sent in a single request for comparison tasks
  • Best results come from high-contrast, well-lit images with readable text
For more API patterns and best practices, see our Getting Started with Claude API guide and Building AI Agents with Claude tutorial for combining vision with tool use.