BeClaude
GuideAdvancedAgents2026-05-15

Claude Computer Use: Complete Guide to Automated Desktop Control

Learn how to use Claude's Computer Use capability to automate desktop tasks — controlling cursors, clicking buttons, typing text, and navigating interfaces through the Claude API.

Quick Answer

Claude Computer Use lets Claude control a desktop interface directly — moving cursors, clicking, typing, and scrolling. Enable it by passing the computer use tool definition to the API with `anthropic.beta.messages.create()`. It requires a screenshot-based loop: capture screen, let Claude decide actions, execute them, and repeat. Best used with a sandboxed environment for safety.

computer-useagentsautomationtool-usebeta

What is Claude Computer Use?

Claude Computer Use is a beta capability that allows Claude to interact with computer interfaces the same way a human would — by looking at screenshots, moving cursors, clicking buttons, typing text, scrolling, and navigating through applications. This transforms Claude from a text-only assistant into an agent that can operate real software.

Unlike traditional automation tools that rely on API integrations or fixed scripts, Computer Use allows Claude to adapt to any interface dynamically, making it capable of working with legacy systems, web applications, and desktop software that have no API access.

How Computer Use Works

Computer Use operates through a simple but powerful loop:

  • Capture: Take a screenshot of the current desktop state
  • Analyze: Claude processes the screenshot to understand the interface
  • Decide: Claude chooses an action (move mouse, click, type, scroll, etc.)
  • Execute: The action is performed on the desktop
  • Repeat: A new screenshot captures the result, and the loop continues
This is an extension of Claude's tool use capabilities, where "computer" is simply another tool that Claude can call.

Setting Up Computer Use

API Requirements

Computer Use requires the Anthropic API with the beta endpoint:

import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, tools=[{ "type": "computer_20250124", "name": "computer", "display_width_px": 1920, "display_height_px": 1080, "display_number": 1, }], messages=[{ "role": "user", "content": "Open the system settings and change the wallpaper" }] )

Environment Setup

For safe Computer Use, always run in a sandboxed environment:

Docker sandbox (recommended):
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
    xvfb x11vnc fluxbox firefox \
    python3 python3-pip
ENV DISPLAY=:99
CMD Xvfb :99 -screen 0 1920x1080x24 +extension RANDR && \
    fluxbox & \
    x11vnc -forever -nopw &
macOS (local setup):
# Grant accessibility permissions in System Settings > Privacy & Security

Then install required tools

pip install pyautogui pillow

Available Actions

Claude can perform these computer actions:

ActionDescriptionParameters
mouse_moveMove cursor to coordinatesx, y
left_clickClick left mouse buttonx, y (optional)
right_clickClick right mouse buttonx, y (optional)
double_clickDouble-click left buttonx, y (optional)
typeType a string of texttext
keyPress a keyboard keykey_combination (e.g., "Ctrl+C")
scrollScroll verticallyx, y, scroll_direction (up/down)
screenshotCapture current screenNone

Building a Computer Use Agent

Here's a complete example of a Computer Use loop:

import anthropic
import pyautogui
import base64
from io import BytesIO
from PIL import Image

client = anthropic.Anthropic() screen_width, screen_height = pyautogui.size()

def take_screenshot() -> str: """Capture screen and return base64-encoded PNG.""" img = pyautogui.screenshot() buffer = BytesIO() img.save(buffer, format="PNG") return base64.b64encode(buffer.getvalue()).decode()

def execute_action(action: dict) -> None: """Execute a computer action from Claude.""" name = action["name"] params = action.get("input", {}) if name == "mouse_move": pyautogui.moveTo(params["x"], params["y"]) elif name == "left_click": if "x" in params: pyautogui.click(params["x"], params["y"]) else: pyautogui.click() elif name == "type": pyautogui.write(params["text"]) elif name == "key": pyautogui.hotkey(*params["key_combination"].split("+")) elif name == "screenshot": pass # Handled below

Main agent loop

def run_computer_agent(task: str, max_steps: int = 20): messages = [{"role": "user", "content": task}] for step in range(max_steps): screenshot = take_screenshot() response = client.beta.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, tools=[{ "type": "computer_20250124", "name": "computer", "display_width_px": screen_width, "display_height_px": screen_height, "display_number": 1, }], messages=messages + [{ "role": "user", "content": [ {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot}}, {"type": "text", "text": f"Current step: {step + 1}. Continue the task: {task}"} ] }] ) for content in response.content: if content.type == "tool_use" and content.name == "computer": execute_action(content.input) messages.append({ "role": "assistant", "content": [content] }) elif content.type == "text": print(f"Claude: {content.text}") # Check for completion if any(c.type == "text" and c.text.strip() for c in response.content): break

Use Cases for Computer Use

1. Automated Web Testing

Navigate through web applications, fill forms, and verify UI behavior without writing brittle selectors or test scripts.

2. Data Entry and Migration

Extract data from legacy systems and enter it into modern applications — perfect for migration projects where APIs don't exist.

3. GUI Application Testing

Test desktop applications that lack instrumentation or built-in testing frameworks.

4. Workflow Automation

Automate repetitive multi-step workflows that span multiple applications — copying data from a spreadsheet into a web form, then generating a report.

Safety and Best Practices

Critical Safety Rules

  • Always use a sandbox — Never give Computer Use access to your primary desktop. Use Docker or a VM.
  • Limit permissions — Run in an isolated environment with no access to sensitive data
  • Set timeouts — Always implement a maximum step limit and overall timeout
  • Human approval for destructive actions — Require confirmation before file deletions, form submissions, or system changes
  • Monitor actively — Watch the agent's actions in real-time

Performance Tips

  • Screenshot quality: Use PNG format for screenshots to ensure Claude can read text clearly
  • Resolution: 1920x1080 is optimal. Very high DPI screens may cause issues.
  • Consistent environment: A clean desktop with minimal clutter helps Claude identify targets
  • Explicit instructions: Be specific about what to click — "Click the blue 'Save' button in the top-right corner" works better than "Save the file"

Limitations and Known Issues

  • Beta status: Computer Use is currently in beta — behavior may change
  • Speed: Screenshot capture and image processing adds latency (1-3 seconds per step)
  • Visual ambiguity: Claude may occasionally click the wrong element if the interface is cluttered
  • Scrolling: Complex scroll interactions (scrollable divs within pages) can be challenging
  • No drag-and-drop: The current tool set doesn't include drag-and-drop actions

Key Takeaways

  • Computer Use lets Claude control desktop interfaces dynamically through a screenshot-analysis-action loop
  • Always run in a sandboxed environment with restricted permissions
  • The beta endpoint requires client.beta.messages.create() with the computer_20250124 tool type
  • Best for: web testing, data migration, GUI testing, and multi-application workflows
  • Monitor actively and set hard limits on steps and execution time
For a deeper understanding of Claude's agent capabilities, read our Building AI Agents with Claude guide. Compare Claude models on our Model Comparison page to find the best model for your Computer Use workload.