Anthropic’s AI Model Can Now Control Computers Visually

Anthropic has released a new public beta feature for its Claude 3.5 Sonnet AI model, allowing it to control computers like a human does by viewing a screen and interacting with it in various ways. The “computer use” feature is now available on the API and enables developers to direct Claude to work on a computer as shown in a video demonstration.

This development builds upon existing capabilities from Microsoft’s Copilot Vision feature, OpenAI’s desktop app for ChatGPT, and Google’s Gemini app for Android phones, which can also perform tasks based on screen observations. However, Anthropic claims its new tool is more advanced, allowing users to click around and perform tasks without manual intervention.

Anthropic notes that the “computer use” feature is still experimental and may be “cumbersome and error-prone.” The company plans to release improved versions of this capability through feedback from developers.

Experts point out that Claude’s visual processing capabilities have limitations, as it relies on a “flipbook” view of the screen, which can miss short-lived actions or notifications. Additionally, Claude is programmed not to engage with social media, election-related activities, and other sensitive topics.

The updated Claude 3.5 Sonnet model has shown significant improvements in various industry benchmarks, particularly in agentic coding and tool use tasks. It outperforms publicly available models on coding performance, with notable gains in SWE-bench Verified and TAU-bench benchmarks.
Source: https://www.theverge.com/2024/10/22/24276822/anthopic-claude-3-5-sonnet-computer-use-ai