Fears of artificial intelligence (AI) massively stealing jobs from people are exaggerated following a recent experiment by Carnegie Mellon University (CMU), which has exposed the glaring limitations of AI agents in performing routine workplace tasks.
Researchers at CMU created a simulated software company, TheAgentCompany, staffed entirely with AI agents from leading tech firms, including Google, OpenAI, Anthropic, and Meta. These AI "employees" were assigned roles such as financial analysts, software engineers, and project managers, with tasks mirroring real-world corporate work — analyzing budgets, writing performance reviews, and even selecting office spaces based on video tours.
More to read:
Google’s AI Mode can understand what images are about and search for related content
The results were disastrous. The best-performing model, Anthropic’s Claude 3.5 Sonnet, completed only 24% of its tasks, costing an average of $6 per task and requiring 30 steps per job.
Google’s Gemini 2.0 Flash had an 11.4% success rate, needing 40 steps per task.
The worst performer, Amazon’s Nova Pro v1, finished just 1.7% of assignments.
The AI agents struggled with basic problem-solving, social interactions, and navigating digital workspaces. In one instance, an AI couldn’t dismiss a pop-up window — instead of clicking the "X" button, it asked HR for IT support and then never followed up. Another agent, unable to find the right colleague in a chat system, fabricated a fake user with the same name as a workaround.
More to read:
[video] OpenAI claims to achieve milestone artificial intelligence level
The study highlights several critical weaknesses in current AI agents. First, the lack of common sense – AI often misinterprets simple instructions, like treating a Word document as a plain text file.
They also exhibited poor social skills – agents struggled to communicate effectively with simulated coworkers.
When faced with unfamiliar obstacles, they either gave up or invented nonsensical solutions, thus failing to adapt to new situations.
The researchers also observed an overreliance on training data – AI performed best in software engineering (where abundant public data exists) but failed in administrative and financial tasks (which rely on private, company-specific workflows).
More to read:
Artificial intelligence learns to diagnose diseases by examining human tongue
Despite hype from CEOs like Nvidia’s Jensen Huang and OpenAI’s Sam Altman — who predict AI agents will soon join the workforce — this study and others suggest AI is far from replacing humans in complex jobs.
Yet, real-world testing proves otherwise. AI agents excel only in narrow, well-defined tasks but collapse when faced with multilayered, dynamic work environments. While some companies, like Johnson & Johnson and Moody’s, report success in using AI agents for specific, data-heavy tasks, they still rely on human oversight.
The CMU study confirms that AI agents are not yet capable of replacing human workers in most professions. Instead of mass unemployment, the future likely involves humans and AI collaborating, with AI handling repetitive tasks while humans manage strategy, creativity, and problem-solving.