GuideAdvancedAgents2026-05-15

Claude Computer Use: Complete Guide to Automated Desktop Control

Learn how to use Claude's Computer Use capability to automate desktop tasks — controlling cursors, clicking buttons, typing text, and navigating interfaces through the Claude API.

Quick Answer

Claude Computer Use lets Claude control a desktop interface directly — moving cursors, clicking, typing, and scrolling. Enable it by passing the computer use tool definition to the API with `anthropic.beta.messages.create()`. It requires a screenshot-based loop: capture screen, let Claude decide actions, execute them, and repeat. Best used with a sandboxed environment for safety.

computer-useagentsautomationtool-usebeta

What is Claude Computer Use?

Claude Computer Use is a beta capability that allows Claude to interact with computer interfaces the same way a human would — by looking at screenshots, moving cursors, clicking buttons, typing text, scrolling, and navigating through applications. This transforms Claude from a text-only assistant into an agent that can operate real software.

Unlike traditional automation tools that rely on API integrations or fixed scripts, Computer Use allows Claude to adapt to any interface dynamically, making it capable of working with legacy systems, web applications, and desktop software that have no API access.

How Computer Use Works

Computer Use operates through a simple but powerful loop:

Capture: Take a screenshot of the current desktop state
Analyze: Claude processes the screenshot to understand the interface
Decide: Claude chooses an action (move mouse, click, type, scroll, etc.)
Execute: The action is performed on the desktop
Repeat: A new screenshot captures the result, and the loop continues

This is an extension of Claude's tool use capabilities, where "computer" is simply another tool that Claude can call.

Setting Up Computer Use

API Requirements

Computer Use requires the Anthropic API with the beta endpoint:

import anthropic
client = anthropic.Anthropic()
response = client.beta.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=[{
        "type": "computer_20250124",
        "name": "computer",
        "display_width_px": 1920,
        "display_height_px": 1080,
        "display_number": 1,
    }],
    messages=[{
        "role": "user",
        "content": "Open the system settings and change the wallpaper"
    }]
)

Environment Setup

For safe Computer Use, always run in a sandboxed environment:

Docker sandbox (recommended):

FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
    xvfb x11vnc fluxbox firefox \
    python3 python3-pip
ENV DISPLAY=:99
CMD Xvfb :99 -screen 0 1920x1080x24 +extension RANDR && \
    fluxbox & \
    x11vnc -forever -nopw &

macOS (local setup):

# Grant accessibility permissions in System Settings > Privacy & Security
Then install required tools
pip install pyautogui pillow

Available Actions

Claude can perform these computer actions:

Action	Description	Parameters
`mouse_move`	Move cursor to coordinates	`x`, `y`
`left_click`	Click left mouse button	`x`, `y` (optional)
`right_click`	Click right mouse button	`x`, `y` (optional)
`double_click`	Double-click left button	`x`, `y` (optional)
`type`	Type a string of text	`text`
`key`	Press a keyboard key	`key_combination` (e.g., "Ctrl+C")
`scroll`	Scroll vertically	`x`, `y`, `scroll_direction` (up/down)
`screenshot`	Capture current screen	None

Building a Computer Use Agent

Here's a complete example of a Computer Use loop:

import anthropic
import pyautogui
import base64
from io import BytesIO
from PIL import Image
client = anthropic.Anthropic()
screen_width, screen_height = pyautogui.size()
def take_screenshot() -> str:
    """Capture screen and return base64-encoded PNG."""
    img = pyautogui.screenshot()
    buffer = BytesIO()
    img.save(buffer, format="PNG")
    return base64.b64encode(buffer.getvalue()).decode()
def execute_action(action: dict) -> None:
    """Execute a computer action from Claude."""
    name = action["name"]
    params = action.get("input", {})
    
    if name == "mouse_move":
        pyautogui.moveTo(params["x"], params["y"])
    elif name == "left_click":
        if "x" in params:
            pyautogui.click(params["x"], params["y"])
        else:
            pyautogui.click()
    elif name == "type":
        pyautogui.write(params["text"])
    elif name == "key":
        pyautogui.hotkey(*params["key_combination"].split("+"))
    elif name == "screenshot":
        pass  # Handled below
Main agent loop
def run_computer_agent(task: str, max_steps: int = 20):
    messages = [{"role": "user", "content": task}]
    
    for step in range(max_steps):
        screenshot = take_screenshot()
        
        response = client.beta.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            tools=[{
                "type": "computer_20250124",
                "name": "computer",
                "display_width_px": screen_width,
                "display_height_px": screen_height,
                "display_number": 1,
            }],
            messages=messages + [{
                "role": "user",
                "content": [
                    {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot}},
                    {"type": "text", "text": f"Current step: {step + 1}. Continue the task: {task}"}
                ]
            }]
        )
        
        for content in response.content:
            if content.type == "tool_use" and content.name == "computer":
                execute_action(content.input)
                messages.append({
                    "role": "assistant",
                    "content": [content]
                })
            elif content.type == "text":
                print(f"Claude: {content.text}")
                
        # Check for completion
        if any(c.type == "text" and c.text.strip() for c in response.content):
            break

Use Cases for Computer Use

1. Automated Web Testing

Navigate through web applications, fill forms, and verify UI behavior without writing brittle selectors or test scripts.

2. Data Entry and Migration

Extract data from legacy systems and enter it into modern applications — perfect for migration projects where APIs don't exist.

3. GUI Application Testing

Test desktop applications that lack instrumentation or built-in testing frameworks.

4. Workflow Automation

Automate repetitive multi-step workflows that span multiple applications — copying data from a spreadsheet into a web form, then generating a report.

Safety and Best Practices

Critical Safety Rules

Always use a sandbox — Never give Computer Use access to your primary desktop. Use Docker or a VM.
Limit permissions — Run in an isolated environment with no access to sensitive data
Set timeouts — Always implement a maximum step limit and overall timeout
Human approval for destructive actions — Require confirmation before file deletions, form submissions, or system changes
Monitor actively — Watch the agent's actions in real-time

Performance Tips

Screenshot quality: Use PNG format for screenshots to ensure Claude can read text clearly
Resolution: 1920x1080 is optimal. Very high DPI screens may cause issues.
Consistent environment: A clean desktop with minimal clutter helps Claude identify targets
Explicit instructions: Be specific about what to click — "Click the blue 'Save' button in the top-right corner" works better than "Save the file"

Limitations and Known Issues

Beta status: Computer Use is currently in beta — behavior may change
Speed: Screenshot capture and image processing adds latency (1-3 seconds per step)
Visual ambiguity: Claude may occasionally click the wrong element if the interface is cluttered
Scrolling: Complex scroll interactions (scrollable divs within pages) can be challenging
No drag-and-drop: The current tool set doesn't include drag-and-drop actions

Key Takeaways

Computer Use lets Claude control desktop interfaces dynamically through a screenshot-analysis-action loop
Always run in a sandboxed environment with restricted permissions
The beta endpoint requires client.beta.messages.create() with the computer_20250124 tool type
Best for: web testing, data migration, GUI testing, and multi-application workflows
Monitor actively and set hard limits on steps and execution time

For a deeper understanding of Claude's agent capabilities, read our Building AI Agents with Claude guide. Compare Claude models on our Model Comparison page to find the best model for your Computer Use workload.