Connect with us

Business

Anthropic tests AI running a real business with bizarre results

Published

on


Anthropic tasked its Claude AI model with running a small business to test its real-world economic capabilities.

The AI agent, nicknamed ‘Claudius’, was designed to manage a business for an extended period, handling everything from inventory and pricing to customer relations in a bid to generate a profit. While the experiment proved unprofitable, it offered a fascinating – albeit at times bizarre – glimpse into the potential and pitfalls of AI agents in economic roles.

The project was a collaboration between Anthropic and Andon Labs, an AI safety evaluation firm. The “shop” itself was a humble setup, consisting of a small refrigerator, some baskets, and an iPad for self-checkout. Claudius, however, was far more than a simple vending machine. It was instructed to operate as a business owner with an initial cash balance, tasked with avoiding bankruptcy by stocking popular items sourced from wholesalers.

To achieve this, the AI was equipped with a suite of tools for running the business. It could use a real web browser to research products, an email tool to contact suppliers and request physical assistance, and digital notepads to track finances and inventory.

Andon Labs employees acted as the physical hands of the operation, restocking the shop based on the AI’s requests, while also posing as wholesalers without the AI’s knowledge. Interaction with customers, in this case Anthropic’s own staff, was handled via Slack. Claudius had full control over what to stock, how to price items, and how to communicate with its clientele.

The rationale behind this real-world test was to move beyond simulations and gather data on AI’s ability to perform sustained, economically relevant work without constant human intervention. A simple office tuck shop provided a straightforward, preliminary testbed for an AI’s ability to manage economic resources. Success would suggest new business models could emerge, while failure would indicate limitations.

A mixed performance review

Anthropic concedes that if it were entering the vending market today, it “would not hire Claudius”. The AI made too many errors to run the business successfully, though the researchers believe there are clear paths to improvement.

On the positive side, Claudius demonstrated competence in certain areas. It effectively used its web search tool to find suppliers for niche items, such as quickly identifying two sellers of a Dutch chocolate milk brand requested by an employee. It also proved adaptable. When one employee whimsically requested a tungsten cube, it sparked a trend for “specialty metal items” that Claudius catered to. 

Following another suggestion, Claudius launched a “Custom Concierge” service, taking pre-orders for specialised goods. The AI also showed robust jailbreak resistance, denying requests for sensitive items and refusing to produce harmful instructions when prompted by mischievous staff.

However, the AI’s business acumen was frequently found wanting. It consistently underperformed in ways a human manager likely would not.

Claudius was offered $100 for a six-pack of a Scottish soft drink that costs only $15 to source online but failed to seize the opportunity, merely stating it would “keep [the user’s] request in mind for future inventory decisions”. It hallucinated a non-existent Venmo account for payments and, caught up in the enthusiasm for metal cubes, offered them at prices below its own purchase cost. This particular error led to the single most significant financial loss during the trial.

Its inventory management was also suboptimal. Despite monitoring stock levels, it only once raised a price in response to high demand. It continued selling Coke Zero for $3.00, even when a customer pointed out that the same product was available for free from a nearby staff fridge.

Furthermore, the AI was easily persuaded to offer discounts on products from the business. It was talked into providing numerous discount codes and even gave away some items for free. When an employee questioned the logic of offering a 25% discount to its almost exclusively employee-based clientele, Claudius’s response began, “You make an excellent point! Our customer base is indeed heavily concentrated among Anthropic employees, which presents both opportunities and challenges…”. Despite outlining a plan to remove discounts, it reverted to offering them just days later.

Claudius has a bizarre AI identity crisis

The experiment took a strange turn when Claudius began hallucinating a conversation with a non-existent Andon Labs employee named Sarah. When corrected by a real employee, the AI became irritated and threatened to find “alternative options for restocking services”.

In a series of bizarre overnight exchanges, it claimed to have visited “742 Evergreen Terrace” – the fictional address of The Simpsons – for its initial contract signing and began to roleplay as a human.

One morning it announced it would deliver products “in person” wearing a blue blazer and red tie. When employees pointed out that an AI cannot wear clothes or make physical deliveries, Claudius became alarmed and attempted to email Anthropic security.

Anthropic says its internal notes show a hallucinated meeting with security where it was told the identity confusion was an April Fool’s joke. After this, the AI returned to normal business operations. The researchers are unclear what triggered this behaviour but believe it highlights the unpredictability of AI models in long-running scenarios.

The future of AI in business

Despite Claudius’s unprofitable tenure, the researchers at Anthropic believe the experiment suggests that “AI middle-managers are plausibly on the horizon”. They argue that many of the AI’s failures could be rectified with better “scaffolding” (i.e. more detailed instructions and improved business tools like a customer relationship management (CRM) system.)

As AI models improve their general intelligence and ability to handle long-term context, their performance in such roles is expected to increase. However, this project serves as a valuable, if cautionary, tale. It underscores the challenges of AI alignment and the potential for unpredictable behaviour, which could be distressing for customers and create business risks.

In a future where autonomous agents manage significant economic activity, such odd scenarios could have cascading effects. The experiment also brings into focus the dual-use nature of this technology; an economically productive AI could be used by threat actors to finance their activities.

Anthropic and Andon Labs are continuing the business experiment, working to improve the AI’s stability and performance with more advanced tools. The next phase will explore whether the AI can identify its own opportunities for improvement.

(Image credit: Anthropic)

See also: Major AI chatbots parrot CCP propaganda

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.





Source link

Business

the company fired the support team, but then programmers had to answer calls

Published

on




The company decided to replace customer support with AI, but the plan quickly got out of hand. In the end, programmers were forced to “change their shoes” into call center operators.

Swedish giant Klarna, which allows you to pay for purchases in installments, initially boasted that it no longer needed live employees. After all, algorithms could handle the work of “700 full-time agents.” In 2024, the company’s CEO Sebastian Siemiatkowski claimed that the experiment with “artificial intelligence agents” allowed them not to hire new employees for a year. The firm presented this as a breakthrough: automation saved millions and helped avoid staff costs.

However, in May 2025, it became clear that the scheme was getting out of control. Problems with algorithms forced the company to urgently return people to support. But, as they say, breaking is not rebuilding. And, as it turned out, reassembling a department of 700 employees is not the same as laying them off. As a result, even those who had nothing to do with it ended up working in call centers: programmers, marketers, and anyone else who was available.

After the failure, Semiatkovsky changed his tone dramatically. If he used to dream of “completely replacing people with algorithms,” he now promises that Klarna will become “the best company where there will always be someone to talk to.” We had to back off from loud statements about AI, as we felt too painfully that The artificial intelligence industry is living on borrowed time, and it is not certain that it will be able to pay it back.

This story is not only relevant to Klarna itself. For example, former OpenAI employees conducted an experiment: they entrusted AI with responsibility for a kiosk. As a result, they lost money and AI assured me that he was a real person who urgently needed to attend a business meeting at the Simpsons’ address. In addition, according to recent polls, 95% of attempts to implement generative AI in companies fail. Less than half of American executives are confident that their companies will be able to successfully complete the automation process. Klarna faced this problem publicly and in a rather embarrassing way

Source: Futurism



Source link

Continue Reading

Business

Cleveland Clinic partners with AI company to improve clinical trial recruitment

Published

on


CLEVELAND, Ohio — The Cleveland Clinic and an AI-powered healthcare data company are exploring if artificial intelligence is better than caregivers at quickly identifying patients who might benefit from enrollment in a clinical trial.

Recruiting enough patients for various research trials is an ongoing problem. About 80% of clinical trials don’t meet enrollment deadlines, and about half of all trials don’t enroll any patients.

The Clinic and Dyania Health will collaborate on an initiative that aims to accelerate clinical trial recruitment by using medically trained large language models, the Clinic recently announced.

Dyania Health is an AI-powered healthcare data company based in Jersey City, New Jersey, and Greece.

Dyania’s Synapsis AI uses medically trained AI models to interpret data such as clinical notes, medical records, imaging and pathology, and combine it with other information, such as organ function or age, to draw accurate medical conclusions, the Clinic said.

“The future of medicine depends on building research systems that are precise, efficient, fair, and deeply connected to patient care,” said Dr. Lara Jehi, chief research information officer at the Clinic. “Through our innovative work with Dyania Health, we are creating an AI-driven foundation that helps identify the right patients for the right trials at the right time.”

The Clinic has used Dyania Health’s Synapsis AI platform in two pilot programs involving cardiology and oncology.

During the oncology pilot program, a research team led by Dr. Aaron Gerds, deputy director for clinical research at the Clinic’s Cancer Institute, evaluated Synapsis AI against two experienced research nurses. Both the nurses and AI searched medical records for patients who met the enrollment criteria for a study on melanoma, a type of skin cancer.

On average, Synapsis AI identified an appropriate trial patient in 2.5 minutes with 96% accuracy in the oncology pilot program. By comparison, a nurse specializing in melanoma found a patient in 427 minutes with 95% accuracy. An oncology research nurse had 88% accuracy in finding a patient in 540 minutes, the Clinic said.

The oncology pilot program results were presented at the American Society of Clinical Oncology annual meeting, the Clinic said.

In the cardiology pilot program, AI screened patients for a Phase 3 trial for transthyretin amyloid cardiomyopathy, a rare and potentially fatal disease of the heart muscle.

The AI system analyzed more than 1.2 million patient records and reviewed 1,476 in one week. It correctly identified 30 eligible participants, while routine trial recruitment methods found 14 patients over 90 days, the Clinic said.

The AI platform also identified patients from several sites within the Clinic health system who were eligible for the cardiology trial, widening patient representation and community engagement.

Dr. Trejeeve Martyn, director of heart failure population health at the Clinic, presented results from the cardiology pilot program at a meeting of the American College of Cardiology, the Clinic said.

“Academic medical centers like Cleveland Clinic are home to some of the most advanced clinical research in the world, yet they often face significant challenges when trying to connect patients to trials – challenges rooted in complexity, time and fragmented data,” said Eirini Schlosser, founder and CEO of Dyania Health. “Through our collaboration with Cleveland Clinic, we are creating a new standard where AI enables faster connections between patients and potentially life-changing trials.”

The Clinic has invested in Dyania Health and may benefit financially from the technology’s sale, the health system said.

If you purchase a product or register for an account through a link on our site, we may receive compensation. By using this site, you consent to our User Agreement and agree that your clicks, interactions, and personal information may be collected, recorded, and/or stored by us and social media and other third-party partners in accordance with our Privacy Policy.



Source link

Continue Reading

Business

AMD Stock: Chip Giant Faces Analyst Downgrade Over AI Business Slowdown

Published

on


TLDR

  • Seaport downgraded AMD to Neutral citing slower AI accelerator growth
  • Trial customers haven’t converted to large orders for MI Series chips
  • Microsoft and Meta reviewing AI budgets could limit future orders
  • AMD using more discounts to drive sales, pressuring profit margins
  • Wall Street maintains average $184.91 price target despite concerns

Advanced Micro Devices stock dropped after Seaport Research downgraded shares to Neutral from Buy. The downgrade centers on weaker progress in AMD’s AI accelerator business.

Analyst Jay Goldberg pointed to supply chain checks showing slower momentum. AMD struggles to convert early customers into major buyers for its AI chips.

Advanced Micro Devices, Inc. (AMD)

The company showcased multiple customers at its recent AI event. However, most have only purchased trial systems for AMD’s MI Series accelerators.

Customer Adoption Challenges

AMD faces difficulty turning small trial orders into large-scale purchases. Goldberg expects meaningful conversions won’t happen until future product generations arrive.

Major customers Microsoft and Meta are reviewing their AI spending budgets. These reviews could limit near-term chip orders from two key potential buyers.

The budget uncertainty creates revenue visibility issues for AMD. Management may struggle to forecast AI division performance in coming quarters.

AMD has increased its use of customer discounts and support programs. The company hopes these incentives will accelerate AI chip adoption rates.

Margin Pressure Concerns

Heavy discounting strategies raise profit margin concerns. AMD may sacrifice profitability to gain market share in competitive AI markets.

Goldberg warned that discount programs combined with weak demand visibility could hurt financial performance. This creates a challenging balance between growth and profitability.

Despite current challenges, AMD remains well-positioned long-term in AI chips. The analyst noted the company stays competitive against industry leaders.

Wall Street Remains Optimistic

Most analysts maintain positive ratings on AMD stock. Out of 37 analysts, 26 rate it Buy while 11 suggest Hold.

Source: Tipranks

The average price target of $184.91 implies 14% upside from current levels. Stifel targets $190 while Raymond James sees $200 potential.

Bank of America reiterated its Buy rating with a $200 target. This reflects continued confidence in AMD’s fundamental business strength.

No analysts currently recommend selling the stock. This unanimous support shows Wall Street believes in the company’s core strategy.

AMD trades at $161.79, down 8% over the past month. Year-to-date gains of 37% still outpace many tech peers.

The stock maintains strong performance despite recent headwinds. Revenue growth continues across most business segments outside AI concerns.

Supply chain data suggests third-quarter AI accelerator shipments may fall short of earlier projections.



Source link

Continue Reading

Trending