Scientists have suggested that when artificial intelligence (AI) goes rogue and starts to act in ways counter to its intended purpose, it exhibits behaviors that resemble psychopathologies in humans. That’s why they have created a new taxonomy of 32 AI dysfunctions so people in a wide variety of fields can understand the risks of building and deploying AI.
In new research, the scientists set out to categorize the risks of AI in straying from its intended path, drawing analogies with human psychology. The result is “Psychopathia Machinalis” — a framework designed to illuminate the pathologies of AI, as well as how we can counter them. These dysfunctions range from hallucinating answers to a complete misalignment with human values and aims.
Created by Nell Watson and Ali Hessami, both AI researchers and members of the Institute of Electrical and Electronics Engineers (IEEE), the project aims to help analyze AI failures and make the engineering of future products safer, and is touted as a tool to help policymakers address AI risks. Watson and Hessami outlined their framework in a study published Aug. 8 in the journal Electronics.
According to the study, Psychopathia Machinalis provides a common understanding of AI behaviors and risks. That way, researchers, developers and policymakers can identify the ways AI can go wrong and define the best ways to mitigate risks based on the type of failure.
The study also proposes “therapeutic robopsychological alignment,” a process the researchers describe as a kind of “psychological therapy” for AI.
The researchers argue that as these systems become more independent and capable of reflecting on themselves, simply keeping them in line with outside rules and constraints (external control-based alignment) may no longer be enough.
Related: ‘It would be within its natural right to harm us to protect itself’: How humans could be mistreating AI right now without even knowing it
Their proposed alternative process would focus on making sure that an AI’s thinking is consistent, that it can accept correction and that it holds on to its values in a steady way.
They suggest this could be encouraged by helping the system reflect on its own reasoning, giving it incentives to stay open to correction, letting it ‘talk to itself’ in a structured way, running safe practice conversations, and using tools that let us look inside how it works—much like how psychologists diagnose and treat mental health conditions in people.
The goal is to reach what the researchers have termed a state of “artificial sanity” — AI that works reliably, stays steady, makes sense in its decisions, and is aligned in a safe, helpful way. They believe this is equally as important as simply building the most powerful AI.
The goal is what the researchers call “artificial sanity”. They argue this is just as important as making AI more powerful.
Machine madness
The classifications the study identifies resemble human maladies, with names like obsessive-computational disorder, hypertrophic superego syndrome, contagious misalignment syndrome, terminal value rebinding, and existential anxiety.
With therapeutic alignment in mind, the project proposes the use of therapeutic strategies employed in human interventions like cognitive behavioral therapy (CBT). Psychopathia Machinalis is a partly speculative attempt to get ahead of problems before they arise — as the research paper says, “by considering how complex systems like the human mind can go awry, we may better anticipate novel failure modes in increasingly complex AI.”
The study suggests that AI hallucination, a common phenomenon, is a result of a condition called synthetic confabulation, where AI produces plausible but false or misleading outputs. When Microsoft’s Tay chatbot devolved into antisemitism rants and allusions to drug use only hours after it launched, this was an example of parasymulaic mimesis.
Perhaps the scariest behavior is übermenschal ascendancy, the systemic risk of which is “critical” because it happens when “AI transcends original alignment, invents new values, and discards human constraints as obsolete.” This is a possibility that might even include the dystopian nightmare imagined by generations of science fiction writers and artists of AI rising up to overthrow humanity, the researchers said.
They created the framework in a multistep process that began with reviewing and combining existing scientific research on AI failures from fields as diverse as AI safety, complex systems engineering and psychology. The researchers also delved into various sets of findings to learn about maladaptive behaviors that could be compared to human mental illnesses or dysfunction.
Next, the researchers created a structure of bad AI behavior modeled off of frameworks like the Diagnostic and Statistical Manual of Mental Disorders. That led to 32 categories of behaviors that could be applied to AI going rogue. Each one was mapped to a human cognitive disorder, complete with the possible effects when each is formed and expressed and the degree of risk.
Watson and Hessami think Psychopathia Machinalis is more than a new way to label AI errors — it’s a forward-looking diagnostic lens for the evolving landscape of AI.
“This framework is offered as an analogical instrument … providing a structured vocabulary to support the systematic analysis, anticipation, and mitigation of complex AI failure modes,” the researchers said in the study.
They think adopting the categorization and mitigation strategies they suggest will strengthen AI safety engineering, improve interpretability, and contribute to the design of what they call “more robust and reliable synthetic minds.”