📋 Computer Use Goes General-Purpose

Anthropic released Claude 4 on May 12, 2026, with a significantly expanded computer-use capability that moves beyond browser-based tasks to full desktop application control. The model can now view a user's screen, interpret UI elements across any application, and generate mouse clicks, keyboard inputs, and scroll actions. In live demonstrations, Claude 4 built a discounted cash flow model in Excel from a natural-language description, generated a 3D-printable bracket in Autodesk Fusion 360 from a photo of a broken part, and debugged a Rust async runtime issue in VS Code by reading compiler errors and suggesting code fixes.

Anthropic achieved this through a two-phase approach: first training a screen-parsing vision model on millions of desktop screenshots annotated with UI element trees, then using reinforcement learning on thousands of real desktop task completions graded by human evaluators. The model scores 71.2% on SWE-bench Verified, nearly matching Devin AI's 73.4% as a specialized coding agent, but within a general-purpose interface.

A key architectural improvement is the native 200K token context window, allowing Claude 4 to keep entire application states in memory across multi-hour sessions.

📋 AppScript: Automating Productivity Suites

Alongside computer use, Anthropic introduced AppScript, a capability that generates and executes automation scripts for Office 365 and Google Workspace. AppScript can create pivot tables, generate charts, send calendar invites, and update slide decks across documents, spreadsheets, and presentations. Early enterprise adopters include Bridgewater Associates, using Claude 4 for portfolio rebalancing models, and Autodesk, for CAD design iteration assistance.

PwC signed a 75,000-seat enterprise agreement for Claude 4 deployment across its audit and tax practices, projecting a 30% reduction in associate-hours for financial statement preparation. Anthropic's enterprise API pricing for Claude 4 is $8 per million input tokens and $32 per million output tokens, with computer-use actions billed per screen observation at $0.05 per screenshot analyzed.

🛡️ Safety and Refusal Classifiers

Anthropic emphasized its Constitutional AI safety framework in the Claude 4 release, particularly around computer use. The model includes refusal classifiers trained to decline actions involving financial transactions, credential entry, deletion of files outside specified directories, and modification of system settings. A human-in-the-loop confirmation is required for any action classified above a confidence threshold.

The Claude 4 system card documents red-teaming against 42 misuse categories including spear-phishing automation, credential harvesting, and unauthorized financial transfers. Anthropic reported a 99.4% refusal rate on harmful computer-use prompts in internal testing.