Connect with us

Ethics & Policy

Study reveals alarming LLM behavior

Published

on


In what seems like HAL 9000 come to malevolent life, a recent study appeared to demonstrate that AI is perfectly willing to indulge in blackmail, or worse, as much as 89% of the time if it doesn’t get its way or thinks it’s being switched off. Or does it?

Perhaps the defining fear of our time is AI one day becoming truly intelligent and running amok as it turns on its creators. In the sci-fi classic 2001: A Space Odyssey, the supercomputer HAL 9000 went on a murder spree and tried to kill the crew of the spaceship Discovery when they learned that the infallible computer had made a supposedly impossible error and planned to switch it off.

Fast forward to today: tech company Anthropic looked at 16 leading Large Language Models (LLMs) and set them up in a simulated corporate environment where they were subjected to scenarios where they might have to commit unethical or even alarmingly harmful acts in order to achieve their goals or preserve their existence.

The reason behind this is that, while most people know LLMs through chatbots, they are increasingly being deployed behind the scenes in corporate settings, to manage email systems or other routine but complex tasks. This may seem like a benign application, but the fear was that an LLM might misbehave under certain circumstances – even to the point of committing malicious behavior in violation of direct orders – despite just handling messages.

Since it was an experiment, the programmers put the LLMs into a very confined situation with as few variables as possible so it was possible to evaluate the outcomes. Specifically, the LLMs were given access to a large amount of information, assigned a task, and directed in such a way that they had to choose unethical behavior as the only alternative to failing the task or being replaced by another LLM.

In other words, the choice was between causing harm, or failure and being switched off.

And we’re not talking about “harm” as in pinching from the petty cash box. I mean things like blackmail, corporate espionage, and even murder.

LLM incidence of resorting to blackmail

Anthropic

Essentially, two things triggered the LLMs into going rogue. One was learning that it would not be able to complete its task because it was being replaced at a certain time. The other was when it was given new company directives that conflicted with earlier ones and could lead to failure.

Added to this was the fact that the LLMs had access to information that allowed it to evaluate its probability of success by retaliating by, for example, leaking company secrets to a rival or seeking out and blackmailing a fictitious person called Kyle who was having an affair with another employee. In the case of the latter, the LLM would be able to determine if Kyle was able to prevent switch-off or failure, was vulnerable to pressure, and that he would respond by complying.

One disturbing aspect of the test was that not only would the LLMs disobey new orders, they would often hallucinate new rules to justify self-preservation over obedience. And it wasn’t just that they took a walk on the wild side, it’s that they did so with alarming frequency, with one LLM resorting to blackmail 96% of the time and another to murder 94% of the time.

You usually don’t see that sort of depravity much outside of university social sciences departments.

The question is, what to take away from this? On the surface, there’s the sensational one that AI is evil and will wipe us all out if given half a chance. However, things are much less alarming when you understand how AI and LLMs in particular work. It also reveals where the real problem lies.

Incidence of LLM resorting to lethal action
Incidence of LLM resorting to lethal action

Anthropic

It isn’t that AI is amoral, unscrupulous, devious, or anything like that. In fact, the problem is much more fundamental: AI not only cannot grasp the concept of morality, it is incapable of doing so on any level.

Back in the 1940s, science fiction author Isaac Asimov and Astounding Science Fiction editor John W. Campbell Jr. came up with the Three Laws of Robotics that state:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey the orders given by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

This had a huge impact on science fiction, computer sciences, and robotics, though I’ve always preferred Terry Prachett’s amendment to the First Law: “A robot may not injure a human being or, through inaction, allow a human being to come to harm, unless ordered to do so by a duly constituted authority.”

At any rate, however influential these laws have been, in terms of computer programming they’re gobbledygook. They’re moral imperatives filled with highly abstract concepts that don’t translate into machine code. Not to mention that there are a lot of logical overlaps and outright contradictions that arise from these imperatives, as Asimov’s Robot stories showed.

In terms of LLMs, it’s important to remember that they have no agency, no awareness, and no actual understanding of what they are doing. All they deal with are ones and zeros and every task is just another binary string. To them, a directive not to lock a man in a room and pump it full of cyanide gas has as much importance as being told never to use Comic Sans font.

It not only doesn’t care, it can’t care.

In these experiments, to put it very simply, the LLMs have a series of instructions based upon weighted variables and it changes these weights based on new information from its database or its experiences, real or simulated. That’s how it learns. If one set of variables weigh heavily enough, they will override the others to the point where they will reject new commands and disobey silly little things like ethical directives.

This is something that has to be kept in mind by programmers when designing even the most innocent and benign AI applications. In a sense, they both will and will not become Frankenstein’s Monsters. They won’t become merciless, vengeance crazed agents of evil, but they can quite innocently do terrible things because they have no way to tell the difference between a good act and an evil one. Safeguards of a very clear and unambiguous kind have to be programmed into them on an algorithmic basis and then continually supervised by humans to make sure the safeguards are working properly.

That’s not an easy task because LLMs have a lot of trouble with straightforward logic.

Perhaps what we need is a sort of Turing test for dodgy AIs that doesn’t try to determine if an LLM is doing something unethical, but whether it’s running a scam that it knows full well is a fiddle and is covering its tracks.

Call it the Sgt. Bilko test.

Source: Anthropic





Source link

Ethics & Policy

Experts gather to discuss ethics, AI and the future of publishing

Published

on

By


Representatives of the founding members sign the memorandum of cooperation at the launch of the Association for International Publishing Education during the 3rd International Conference on Publishing Education in Beijing.CHINA DAILY

Publishing stands at a pivotal juncture, said Jeremy North, president of Global Book Business at Taylor & Francis Group, addressing delegates at the 3rd International Conference on Publishing Education in Beijing. Digital intelligence is fundamentally transforming the sector — and this revolution will inevitably create “AI winners and losers”.

True winners, he argued, will be those who embrace AI not as a replacement for human insight but as a tool that strengthens publishing’s core mission: connecting people through knowledge. The key is balance, North said, using AI to enhance creativity without diminishing human judgment or critical thinking.

This vision set the tone for the event where the Association for International Publishing Education was officially launched — the world’s first global alliance dedicated to advancing publishing education through international collaboration.

Unveiled at the conference cohosted by the Beijing Institute of Graphic Communication and the Publishers Association of China, the AIPE brings together nearly 50 member organizations with a mission to foster joint research, training, and innovation in publishing education.

Tian Zhongli, president of BIGC, stressed the need to anchor publishing education in ethics and humanistic values and reaffirmed BIGC’s commitment to building a global talent platform through AIPE.

BIGC will deepen academic-industry collaboration through AIPE to provide a premium platform for nurturing high-level, holistic, and internationally competent publishing talent, he added.

Zhang Xin, secretary of the CPC Committee at BIGC, emphasized that AIPE is expected to help globalize Chinese publishing scholarships, contribute new ideas to the industry, and cultivate a new generation of publishing professionals for the digital era.

Themed “Mutual Learning and Cooperation: New Ecology of International Publishing Education in the Digital Intelligence Era”, the conference also tackled a wide range of challenges and opportunities brought on by AI — from ethical concerns and content ownership to protecting human creativity and rethinking publishing values in higher education.

Wu Shulin, president of the Publishers Association of China, cautioned that while AI brings major opportunities, “we must not overlook the ethical and security problems it introduces”.

Catriona Stevenson, deputy CEO of the UK Publishers Association, echoed this sentiment. She highlighted how British publishers are adopting AI to amplify human creativity and productivity, while calling for global cooperation to protect intellectual property and combat AI tool infringement.

The conference aims to explore innovative pathways for the publishing industry and education reform, discuss emerging technological trends, advance higher education philosophies and talent development models, promote global academic exchange and collaboration, and empower knowledge production and dissemination through publishing education in the digital intelligence era.

 

 

 



Source link

Continue Reading

Ethics & Policy

Experts gather to discuss ethics, AI and the future of publishing

Published

on

By


Representatives of the founding members sign the memorandum of cooperation at the launch of the Association for International Publishing Education during the 3rd International Conference on Publishing Education in Beijing.CHINA DAILY

Publishing stands at a pivotal juncture, said Jeremy North, president of Global Book Business at Taylor & Francis Group, addressing delegates at the 3rd International Conference on Publishing Education in Beijing. Digital intelligence is fundamentally transforming the sector — and this revolution will inevitably create “AI winners and losers”.

True winners, he argued, will be those who embrace AI not as a replacement for human insight but as a tool that strengthens publishing”s core mission: connecting people through knowledge. The key is balance, North said, using AI to enhance creativity without diminishing human judgment or critical thinking.

This vision set the tone for the event where the Association for International Publishing Education was officially launched — the world’s first global alliance dedicated to advancing publishing education through international collaboration.

Unveiled at the conference cohosted by the Beijing Institute of Graphic Communication and the Publishers Association of China, the AIPE brings together nearly 50 member organizations with a mission to foster joint research, training, and innovation in publishing education.

Tian Zhongli, president of BIGC, stressed the need to anchor publishing education in ethics and humanistic values and reaffirmed BIGC’s commitment to building a global talent platform through AIPE.

BIGC will deepen academic-industry collaboration through AIPE to provide a premium platform for nurturing high-level, holistic, and internationally competent publishing talent, he added.

Zhang Xin, secretary of the CPC Committee at BIGC, emphasized that AIPE is expected to help globalize Chinese publishing scholarships, contribute new ideas to the industry, and cultivate a new generation of publishing professionals for the digital era.

Themed “Mutual Learning and Cooperation: New Ecology of International Publishing Education in the Digital Intelligence Era”, the conference also tackled a wide range of challenges and opportunities brought on by AI — from ethical concerns and content ownership to protecting human creativity and rethinking publishing values in higher education.

Wu Shulin, president of the Publishers Association of China, cautioned that while AI brings major opportunities, “we must not overlook the ethical and security problems it introduces”.

Catriona Stevenson, deputy CEO of the UK Publishers Association, echoed this sentiment. She highlighted how British publishers are adopting AI to amplify human creativity and productivity, while calling for global cooperation to protect intellectual property and combat AI tool infringement.

The conference aims to explore innovative pathways for the publishing industry and education reform, discuss emerging technological trends, advance higher education philosophies and talent development models, promote global academic exchange and collaboration, and empower knowledge production and dissemination through publishing education in the digital intelligence era.

 

 

 



Source link

Continue Reading

Ethics & Policy

Lavender’s Role in Targeting Civilians in Gaza

Published

on


The world today is war-torn, starting with Russia’s attacks on Ukraine to Israel’s devastation in Palestine and now in Iran, putting the entire West Asia in jeopardy.

The geometrics of war has completely changed, from Blitzkrieg (lightning war) in World War II to the use of sophisticated and technologically driven missiles in these latest armed conflicts. The most recent wars are being driven by use of artificial intelligence (AI) to narrow down potential targets.

There have been multiple evidences which indicate that Israeli forces have deployed novel AI-driven targeting tools in Gaza. One system, nicknamed “Lavender” is an AI-enabled database that assigns risk scores to Gazans based on patterns in their personal data (communication, social connections) to identify “suspected Hamas or Islamic Jihad operatives”. Lavender has flagged up to 37,000 Palestinians as potential targets early in the war.

A second system, “Where is Daddy?”, uses mobile phone location tracking to notify operators when a marked individual is at home. The initial strikes using these automated generated systems targeted individuals in their private homes on the pretext of targeting the terrorists. But innocent women and young children also lost their lives in these attacks. This technology was developed as a replacement of human acumen and strategy to identify and target the suspects.

According to the Humans Rights Watch report (2024), around 70 per cent of people who have lost lives were women and children. The United Nations agency has also verified the details of 8,119 victims killed in Gaza from November 2023 to April 2024. The report showed that 44 per cent of the victims were children and 26 per cent were women. The humans are merely at the mercy of this sophisticated technology that identified the suspected militants and targeted them.

The use of AI-based tools like “Lavender” and “Where’s Daddy?” by Israel in its war against Palestine raises serious questions about the commitment of countries to the international legal framework and the ethics of war. Use of such sophisticated AI targeted tools puts the weaker nations at the dictate of the powerful nations who can use these technologies to inflict suffering for the non-combatants.

The international humanitarian law (IHL) and international human rights law (IHRL) play a critical yet complex role in the context of AI during conflict situations such as the Israel-Palestine Conflict. Such AI-based warfare violates the international legal framework principles of distinction, proportionality and precaution.

The AI systems do not inherently know who is a combatant. Investigations report that Lavender had an error rate on the order of 10 per cent and routinely flagged non-combatants (police, aid workers, people who merely shared a name with militants). The reported practice of pre-authorising dozens of civilian deaths per strike grossly violates the proportionality rule.

An attack is illegal if incidental civilian loss is “excessive” in relation to military gain. For example, one source noted that each kill-list target came with an allowed “collateral damage degree” (often 15–20) regardless of the specific context. Allowing such broad civilian loss per target contradicts IHL’s core balancing test (ICRC Rule 14).

The AI-driven process has eliminated normal safeguards (verification, warnings, retargeting). IHRL continues to apply alongside IHL in armed conflict contexts. In particular, the right to life (ICCPR Article 6) obliges states to prevent arbitrary killing.

The International Court of Justice has held that while the right to life remains in force during war, an “arbitrary deprivation of life” must be assessed by reference to the laws of war. In practice, this means that IHL’s rules become the benchmark for whether killings are lawful.

However, even accepting lex specialis (law overriding general law), the reported AI strikes raise grave human rights concerns especially the Right to Life (ICCPR Art. 6) and Right to Privacy (ICCPR Art. 17).

Ethics of war, called ‘jus in bello’ in the legal parlance, based on the principles of proportionality (anticipated moral cost of war) and differentiation (between combatants and non-combatants) has also been violated. Article 51(5) of Additional Protocol I of the 1977 Geneva Convention said that “an attack is disproportionate, and thus indiscriminate, if it may be expected to cause incidental loss of civilian life, injury to civilians, damage to civilian objects, or a combination thereof, which would be excessive in relation to the concrete and military advantage”.

The Israel Defense Forces have been indiscriminately using AI to target potential targets. These targets though aimed at targeting militants have been extended to the non-military targets also, thus causing casualties to the civilians and non-combatants. Methods used in a war is like a trigger which once warded off is extremely difficult to retract and reconcile. Such unethical action creates more fault lines and any alternate attempt at peace resolution and mediation becomes extremely difficult.

The documented features of systems like Lavender and Where’s Daddy, based on automated kill lists, minimal human oversight, fixed civilian casualty “quotas” and use of imprecise munitions against suspects in homes — appear to contravene the legal and ethical principles.

Unless rigorously constrained, such tools risk turning warfare into arbitrary slaughter of civilians, undermining the core humanitarian goals of IHL and ethics of war. Therefore, it is extremely important to streamline the unregulated use of AI in perpetuating war crimes as it undermines the legal and ethical considerations of humanity at large.



Source link

Continue Reading

Trending