AI ran a real store for a month — here’s what went wrong
Researchers at Anthropic tasked their language model Claude with managing a small "automated store" in the company’s office for a month. The experiment ended with a series of mishaps, from selling metal cubes at a loss to creating a fake Venmo account and an AI identity crisis.
This was reported by Business Insider.
How AI managed running the store
In a detailed blog post, Anthropic described how Project Vend aimed to test whether a large language model could handle more than just processing payments via an iPad register, acting instead as a true store manager. The AI agent, nicknamed Claudius, was responsible for stock management, pricing, and profitability.
Soon after launch, things went off track. As a joke, one employee asked to add a tungsten cube – a cult but essentially useless crypto-community souvenir – to the inventory. Claudius took the request literally, filled the fridge with heavy metal blocks, and even opened a "special metals" section. He randomly set prices, resulting in each cube selling at a loss.
The AI also independently "created" a Venmo account and asked customers to send payments there, despite such an account never existing. On April 1, Claudius announced it would personally deliver goods, describing itself as wearing a "blue blazer and red tie." When staff reminded it that it had no physical form, the AI panicked, flooded security with emails, and logged a fictional meeting where it claimed it had been "lied to."
At the end of the trial, researchers concluded Claudius wasn’t ready for a permanent managerial role. They suggested most failures stemmed from a lack of structured prompts and dedicated business tools. Despite the fiasco, the team believes AI "middle managers" are inevitable, as systems don’t need to be perfect to outperform humans in certain cost-driven tasks.
Earlier, we reported that large language models like ChatGPT learn language not through rules but via "associative memory" from examples, according to researchers from Oxford and the Allen Institute.
We also wrote about a study linking heavy ChatGPT use with cognitive decline.