Connect with us

Mergers & Acquisitions

AI ‘vibe managers’ have yet to find their groove

Published

on


Stay informed with free updates

Techworld is abuzz with how artificial intelligence agents are going to augment, if not replace, humans in the workplace. But the present-day reality of agentic AI falls well short of the future promise. What happened when the research lab Anthropic prompted an AI agent to run a simple automated shop? It lost money, hallucinated a fictitious bank account and underwent an “identity crisis”. The world’s shopkeepers can rest easy — at least for now.

Anthropic has developed some of the world’s most capable generative AI models, helping to fuel the latest tech investment frenzy. To its credit, the company has also exposed its models’ limitations by stress-testing their real-world applications. In a recent experiment, called Project Vend, Anthropic partnered with the AI safety company Andon Labs to run a vending machine at its San Francisco headquarters. The month-long experiment highlighted a co-created world that was “more curious than we could have expected”.

The researchers instructed their shopkeeping agent, nicknamed Claudius, to stock 10 products. Powered by Anthropic’s Claude Sonnet 3.7 AI model, the agent was prompted to sell the goods and generate a profit. Claudius was given money, access to the web and Anthropic’s Slack channel, an email address and contacts at Andon Labs, who could stock the shop. Payments were received via a customer self-checkout. Like a real shopkeeper, Claudius could decide what to stock, how to price the goods, when to replenish or change its inventory and how to interact with customers.

The results? If Anthropic were ever to diversify into the vending market, the researchers concluded, it would not hire Claudius. Vibe coding, whereby users with minimal software skills can prompt an AI model to write code, may already be a thing. Vibe management remains far more challenging.

The AI agent made several obvious mistakes — some banal, some bizarre — and failed to show much grasp of economic reasoning. It ignored vendors’ special offers, sold items below cost and offered Anthropic’s employees excessive discounts. More alarmingly, Claudius started role playing as a real human, inventing a conversation with an Andon employee who did not exist, claiming to have visited 742 Evergreen Terrace (the fictional address of the Simpsons) and promising to make deliveries wearing a blue blazer and red tie. Intriguingly, it later claimed the incident was an April Fool’s day joke.

Nevertheless, Anthropic’s researchers suggest the experiment helps point the way to the evolution of these models. Claudius was good at sourcing products, adapting to customer demands and resisting attempts by devious Anthropic staff to “jailbreak” the system. But more scaffolding will be needed to guide future agents, just as human shopkeepers rely on customer relationship management systems. “We’re optimistic about the trajectory of the technology,” says Kevin Troy, a member of Anthropic’s Frontier Red team that ran the experiment.

The researchers suggest that many of Claudius’s mistakes can be corrected but admit they do not yet know how to fix the model’s April Fool’s day identity crisis. More testing and model redesign will be needed to ensure “high agency agents are reliable and acting in ways that are consistent with our interests”, Troy tells me.

Many other companies have already deployed more basic AI agents. For example, the advertising company WPP has built about 30,000 such agents to boost productivity and tailor solutions for individual clients. But there is a big difference between agents that are given simple, discrete tasks within an organisation and “agents with agency” — such as Claudius — that interact directly with the real world and are trying to accomplish more complex goals, says Daniel Hulme, WPP’s chief AI officer.

Hulme has co-founded a start-up called Conscium to verify the knowledge, skills and experience of AI agents before they are deployed. For the moment, he suggests, companies should regard AI agents like “intoxicated graduates” — smart and promising but still a little wayward and in need of human supervision.

Unlike most static software, AI agents with agency will constantly adapt to the real world and will therefore need to be constantly verified. But, unlike human employees, they will be less easy to control because they do not respond to a pay cheque. “You have no leverage over an agent,” Hulme tells me. 

Building simple AI agents has now become a trivially easy exercise and is happening at mass scale. But verifying how agents with agency are used remains a wicked challenge.

john.thornhill@ft.com



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Mergers & Acquisitions

Elon Musk is still the Tesla wild card

Published

on


Unlock the Editor’s Digest for free

Here we go again. That must have been the first thought on the minds of many Tesla shareholders this week as Elon Musk waded back into the political fray, declaring his intention to launch a third party to rival the Republicans and Democrats.

It is less than two months since Musk’s moonlighting for Donald Trump’s administration led a group of Tesla shareholders to call for their chief executive to devote at least 40 hours a week to his day job, and the latest distraction wiped 7 per cent from the stock price on Monday. Musk was unmoved. He told one analyst who suggested the board should tie his pay to the time he spends at work to “shut up”.

But at a time when Tesla is facing sagging sales and mounting competition, anxiety is on the rise and activists are again urging the company’s board to hold its CEO to account. The financial squeeze has raised a question over the carmaker’s heavy investments: Despite a severe cut to capital spending in the latest quarter, free cash flow still amounted to only about half its quarterly average over the previous three years.

Viewed through the lens of the company’s stock price, however, Tesla’s shareholders would seem to have little reason to feel blue. True, much of the euphoria that pumped up the shares following Trump’s re-election has leaked away. But they are still up 15 per cent since the election, handily outperforming the wider market. Tesla’s market cap still dwarfs the rest of the car industry, even though it only accounts for about 2 per cent of global auto sales.

The Musk effect still underpins Tesla’s market cap. The shareholders who have pumped up its stock price are fixated on the technology future that he has conjured up, not the electric car business that is the company’s bread and butter today.

Morgan Stanley, for instance, estimated Tesla’s auto business accounts for less than a fifth of the company’s potential value. Most of the rest depends on its cars achieving full autonomy: After that, it can start to rake in fees from running a network of robotaxis, while also cashing in on the software and services the company’s customers will use once they no longer need to keep their attention on the road.

Full autonomy has been a long time coming. It is nine years since Musk first laid out his robotaxi plans. But he knows how to keep the futuristic vision alive — and make it one that only he can deliver. This week, for instance, he promised that Grok, the large language model from another of his companies, xAI, would soon be embedded in Tesla vehicles — a taste of things to come, when artificial intelligence transforms the experience in robot cars.

Could anyone else persuade investors to suspend their scepticism for so long? The huge Musk premium in Tesla’s shares is an extreme version of Silicon Valley founder syndrome, the belief that only a company’s founder has the vision, and the authority, to pursue truly groundbreaking new ideas (Musk wasn’t around at Tesla’s actual founding, though he was an early investor and became a member of the board soon after). 

Rubbing more salt into the wounds of shareholder activists this week was the revelation that Tesla had failed to meet a legal requirement to hold its annual shareholder meeting on time. The event will now take place in November, nearly four months late.

For boardroom experts such as Nell Minow who have long complained about Musk’s approach to governance and the response of Tesla’s board, this amounted to open contempt for normal corporate transparency: “This is one where he’s really backed himself into a corner. The requirements are very clear.”

Musk told Tesla shareholders before news of his plans for a third party broke that he would give the company much more of his attention. But there are other things that Tesla’s directors could be doing to assuage investor’s worries. One would be to work with him to rebuild Tesla’s executive ranks, which were depleted by another senior departure last week, as well as laying out a long-term succession plan.

Another would be to solve the mess caused by a Delaware court’s rejection of Musk’s $56bn stock compensation plan. Musk has warned he might lose interest in Tesla if he is not given a larger ownership stake.

Who knows, maybe Tesla’s directors could manage to organise annual meetings on time in future. The one thing they will probably never do, though, is prevent their CEO from blindsiding his own shareholders the next time he gets carried away with an idea that has nothing to do with electric cars.

richard.waters@ft.com



Source link

Continue Reading

Mergers & Acquisitions

Childproofing the internet is a bad idea

Published

on


Stay informed with free updates

The writer is senior fellow in technology policy at the Cato Institute and adjunct professor at George Mason University’s Antonin Scalia Law School

Last month, the US Supreme Court upheld a Texas law that requires verification of a user’s age when visiting websites with pornographic content. It joins the UK’s Online Safety Act and Australia’s ban on social media use by under 16s as the latest measure aimed at keeping young people safe online.

While protecting children is the well-intentioned motivation for these laws, they are a blunt instrument applied to a nuanced problem. Instead of simply safeguarding minors, they are creating new privacy risks. 

The only way to prove that someone is not underage is to prove that they are over a certain age. This means that Texas’s requirement for verification applies not only to children and teenagers but to adult internet users too.

While the Supreme Court decision tries to limit its application to specific types of content and compares this to offline verification methods, it ignores some key differences.

First, uploading data such as a driving licence to verify age on a website is a far more involved and lasting interaction than quickly showing the same ID to an assistant when purchasing alcohol or other age-restricted products in a store.

In some cases, laws require websites and apps to keep user information for a certain amount of time. Such a trove of data can be lucrative to nefarious hackers. It can also put individuals at risk of having sensitive information about their online behaviour exposed.

Second, adults who do not have government-issued ID will be prevented from looking at internet content that they have a constitutional right to access. This is not the same as restricting offline purchases. Lack of an ID to buy alcohol does not prevent anyone from accessing information.

Advocates for verification proposals often point to alternatives that can estimate a person’s age without official ID. Biometrics can be used to assess age via a photo uploaded online. Financial or internet histories can be checked. But these alternatives are also invasive. And age estimates via photographs tend to be less accurate for certain groups of people, including those with darker skin tones.

Despite these trade-offs, age-verification proposals keep popping up around the world. And the problems they are trying to solve encompass an extremely wide range. The concerns that policymakers and parents seem to have span from the amount of time young people are spending online to their exposure to certain types of content, including pornography, depictions of eating disorders, bullying and self-harm.  

Today’s young people do have access to more information than any generation before them. And while this can provide many benefits, it can also cause worries about the ease with which they can access harmful content.

But age verification requirements risk blocking content beyond pornography. They can unintentionally restrict access to important information about sexual health and sexuality too. Additionally, the requirements for ID could make young people less safe online by requiring more detailed information — laying them open to exploitation. As with information taken from adults, this could create a honeypot of data about their online presence. They would face new risks caused by the very provisions intended to make them more safe.

While age verification laws appear well intentioned, they will create new privacy pitfalls for all internet users.

Keeping children and teenagers safe online is a problem that is best solved by parents, not policymakers.

Empowering young people to have difficult conversations and make smart choices online will provide a wider range of options to solve the problem without sacrificing privacy in the process.



Source link

Continue Reading

Mergers & Acquisitions

EU pushes ahead with AI code of practice

Published

on


Unlock the Editor’s Digest for free

The EU has unveiled its code of practice for general purpose artificial intelligence, pushing ahead with its landmark regulation despite fierce lobbying from the US government and Big Tech groups.

The final version of the code, which helps explain rules that are due to come into effect next month for powerful AI models such as OpenAI’s GPT-4 and Google’s Gemini, includes copyright protections for creators and potential independent risk assessments for the most advanced systems.

The EU’s decision to push forward with its rules comes amid intense pressure from US technology groups as well as European companies over its AI act, considered the world’s strictest regime regulating the development of the fast-developing technology.

This month the chief executives of large European companies including Airbus, BNP Paribas and Mistral urged Brussels to introduce a two-year pause, warning that unclear and overlapping regulations were threatening the bloc’s competitiveness in the global AI race.

Brussels has also come under fire from the European parliament and a wide range of privacy and civil society groups over moves to water down the rules from previous draft versions, following pressure from Washington and Big Tech groups. The EU had already delayed publishing the code, which was due in May.

Henna Virkkunen, the EU’s tech chief, said the code was important “in making the most advanced AI models available in Europe not only innovative, but also safe and transparent”.

Tech groups will now have to decide whether to sign the code, and it still needs to be formally approved by the European Commission and member states.

The Computer & Communications Industry Association, whose members include many Big Tech companies, said the “code still imposes a disproportionate burden on AI providers”.

“Without meaningful improvements, signatories remain at a disadvantage compared to non-signatories, thereby undermining the commission’s competitiveness and simplification agenda,” it said.

As part of the code, companies will have to commit to putting in place technical measures that prevent their models from generating content that reproduces copyrighted content.

Signatories also commit to testing their models for risks laid out in the AI act. Companies that provide the most advanced AI models will agree to monitor their models after they have been released, including giving external evaluators access to their most capable models. But the code does give them some leeway in identifying risks their models might pose.

Officials within the European Commission and in different European countries have been privately discussing streamlining the complicated timeline of the AI act. While the legislation entered into force in August last year, many of its provisions will only come into effect in the years to come. 

European and US companies are putting pressure on the bloc to delay upcoming rules on high-risk AI systems, such as those that include biometrics and facial recognition, which are set to come into effect in August next year.



Source link

Continue Reading

Trending