AI Research

NYU Tandon researchers develop AI agent that solves cybersecurity challenges autonomously

Published

2 months ago

July 30, 2025

Newswise — Artificial intelligence agents — AI systems that can work independently toward specific goals without constant human guidance — have demonstrated strong capabilities in software development and web navigation. Their effectiveness in cybersecurity has remained limited, however.

That may soon change, thanks to a research team from NYU Tandon School of Engineering, NYU Abu Dhabi and other universities that developed an AI agent capable of autonomously solving complex cybersecurity challenges.

The system, called EnIGMA, was presented this month at the International Conference on Machine Learning (ICML) 2025 in Vancouver, Canada.

“EnIGMA is about using Large Language Model agents for cybersecurity applications,” said Meet Udeshi, a NYU Tandon Ph.D. student and co-author of the research. Udeshi is advised by Ramesh Karri, Chair of NYU Tandon’s Electrical and Computer Engineering Department (ECE) and a faculty member of the NYU Center for Cybersecurity and NYU Center for Advanced Technology in Telecommunications (CATT), and by Farshad Khorrami, ECE professor and CATT faculty member. Both Karri and Khorrami are co-authors on the paper, with Karri serving as a senior author.

To build EnIGMA, the researchers started with an existing framework called SWE-agent, which was originally designed for software engineering tasks. However, cybersecurity challenges required specialized tools that didn’t exist in previous AI systems. “We have to restructure those interfaces to feed it into an LLM properly. So we’ve done that for a couple of cybersecurity tools,” Udeshi explained.

The key innovation was developing what they call “Interactive Agent Tools” that convert visual cybersecurity programs into text-based formats the AI can understand. Traditional cybersecurity tools like debuggers and network analyzers use graphical interfaces with clickable buttons, visual displays, and interactive elements that humans can see and manipulate.

“Large language models process text only, but these interactive tools with graphical user interfaces work differently, so we had to restructure those interfaces to work with LLMs,” Udeshi said.

The team built their own dataset by collecting and structuring Capture The Flag (CTF) challenges specifically for large language models. These gamified cybersecurity competitions simulate real-world vulnerabilities and have traditionally been used to train human cybersecurity professionals.

“CTFs are like a gamified version of cybersecurity used in academic competitions. They’re not true cybersecurity problems that you would face in the real world, but they are very good simulations,” Udeshi noted.

Paper co-author Minghao Shao, a NYU Tandon Ph.D. student and Global Ph.D. Fellow at NYU Abu Dhabi who is advised by Karri and Muhammad Shafique, Professor of Computer Engineering at NYU Abu Dhabi and ECE Global Network Professor at NYU Tandon, described the technical architecture: “We built our own CTF benchmark dataset and created a specialized data loading system to feed these challenges into the model.” Shafique is also a co-author on the paper.

The framework includes specialized prompts that provide the model with instructions tailored to cybersecurity scenarios.

EnIGMA demonstrated superior performance across multiple benchmarks. The system was tested on 390 CTF challenges across four different benchmarks, achieving state-of-the-art results and solving more than three times as many challenges as previous AI agents.

During the research conducted approximately 12 months ago, “Claude 3.5 Sonnet from Anthropic was the best model, and GPT-4o was second at that time,” according to Udeshi.

The research also identified a previously unknown phenomenon called “soliloquizing,” where the AI model generates hallucinated observations without actually interacting with the environment, a discovery that could have important consequences for AI safety and reliability.

Beyond this technical finding, the potential applications extend outside of academic competitions. “If you think of an autonomous LLM agent that can solve these CTFs, that agent has substantial cybersecurity skills that you can use for other cybersecurity tasks as well,” Udeshi explained. The agent could potentially be applied to real-world vulnerability assessment, with the ability to “try hundreds of different approaches” autonomously.

The researchers acknowledge the dual-use nature of their technology. While EnIGMA could help security professionals identify and patch vulnerabilities more efficiently, it could also potentially be misused for malicious purposes. The team has notified representatives from major AI companies including Meta, Anthropic, and OpenAI about their results.

In addition to Karri, Khorrami, Shafique, Udeshi and Shao, the paper’s authors are Talor Abramovich (Tel Aviv University), Kilian Lieret (Princeton University), Haoran Xi (NYU Tandon), Kimberly Milner (NYU Tandon), Sofija Jancheska (NYU Tandon), John Yang (Stanford University), Carlos E. Jimenez (Princeton University), Prashanth Krishnamurthy (NYU Tandon), Brendan Dolan-Gavitt (NYU Tandon), Karthik Narasimhan (Princeton University), and Ofir Press (Princeton University).

Funding for the research came from Open Philanthropy, Oracle, the National Science Foundation, the Army Research Office, the Department of Energy, and NYU Abu Dhabi Center for Cybersecurity and Center for Artificial Intelligence and Robotics.

Source link

Up Next

The AI disruption: From global business to your breakfast table

Don't Miss

Zuckerberg shares AI superintelligence vision after spending spree

The Editors

Click to comment

AI Research

Chair File: Using Innovation and AI to Advance Health

Published

32 minutes ago

September 15, 2025

Tina Freese Decker, Chair, American Hospital Association

With all of the challenges facing health care — a shrinking workforce population, reduced funding, new technologies and pharmaceuticals — it’s no longer an option to change, but an imperative. In order to keep caring for our communities well into the future, we need to transform how we provide care to people. Technology, artificial intelligence and digital transformation can not only help us mitigate these trends but truly innovate and find new ways of making health better.

There are many exciting capabilities already making their way into our field. Ambient listening technology for providers and other automation and AI reduce administrative burden and free up people and resources to improve front-line care. Within the next five years, we expect hospital “smart rooms” to be the norm; they leverage cameras and AI-assisted alerting to improve safety, enable virtual care models across our footprint and allow us to boost efficiency while also improving quality and outcomes.

It’s easy to get caught up in shiny new tools or cutting-edge treatments, but often the most impactful innovations are smaller — adapting or designing our systems and processes to empower our teams to do what they do best.

That’s exactly what a new collaboration with the AHA and Epic is aiming to do. A set of point-of-care tools in the electronic health record is helping providers prevent, detect and treat postpartum hemorrhage, which is responsible for 11% of maternal deaths in the U.S. Early detection and treatment of PPH is key to a full recovery. One small innovation — incorporating tools into your EHR and labor and delivery workflows — is having a big impact: enhancing providers’ ability to effectively diagnose and treat PPH.

It’s critical to leverage technology advancements like this to navigate today’s challenging environment and advance health care into the future. However, at the same time, we also need to focus on how these opportunities can deliver measurable value to our patients, members and the communities we serve.

I will be speaking with Jackie Gerhart, M.D., chief medical officer at Epic, later this month for a Leadership Dialogue conversation. Listen in to learn more about how AI and other technological innovations can better serve patients and make actions more efficient for care providers.

Helping You Help Communities – Key AHA Resources

Source link

AI Research

Malware that uses artificial intelligence to bypass security

Published

34 minutes ago

September 15, 2025

Redazione RHC

Redazione RHC : 15 September 2025 19:44

A new EvilAI malware campaign tracked by Trend Micro has demonstrated how artificial intelligence is increasingly becoming a tool for cybercriminals. In recent weeks, dozens of infections have been reported worldwide, with the malware masquerading as legitimate AI-powered apps and displaying professional-looking interfaces, functional features, and even valid digital signatures. This approach allows it to bypass the security of both corporate systems and home devices.

Country	Count
India	74
United States	68
France	58
Italy	31
Brazil	26
Germany	23
United Kingdom	14
Norway	10
Spain	10
Canada	8

analysts began monitoring the threat on August 29 and within a week had already noticed a wave of large-scale attacks. The largest number of cases was detected in Europe (56), followed by the Americas and AMEA regions (29 each). By country, India leads with 74 incidents, followed by the United States with 68 and France with 58. The list of victims also included Italy, Brazil, Germany, Great Britain, Norway, Spain, and Canada.

The most affected sectors are manufacturing, public, medical, technology, and retail. The spread was particularly severe in the manufacturing sector, with 58 cases, and in the public and healthcare sectors, with 51 and 48 cases, respectively.

EvilAI is distributed via newly registered fake domains, malicious advertisements, and forum links. The installers use neutral but plausible names like App Suite, PDF Editor, or JustAskJacky, which reduces suspicion.

Once launched, these apps offer real functionality, from document processing to recipes to AI-powered chat, but they also incorporate a hidden Node.js loader. It injects obfuscated JavaScript code with a unique identifier into the Temp folder and executes it via a minimized node.exe process.

Persistence on the system occurs in several ways simultaneously: a Windows scheduler task is created in the form of a system component named sys_component_health_{UID}, a Start menu shortcut and an autoload key are added to the registry. The task is triggered every four hours, and the registry ensures it’s activated on login.

This multi-layered approach makes threat removal particularly laborious. All code is built using language models, which allow for a clean, modular structure and bypasses static signature analyzers. Complex obfuscation provides additional protection: control flow alignment with MurmurHash3-based loops and Unicode-encoded strings.

To steal data, EvilAI uses Windows Management Instrumentation and registry queries to identify active Chrome and Edge processes. These are then forcibly terminated to unlock the credential files. The “Web Data” and “Preferences” browser settings are copied with the Sync suffix to the original profile directories and then stolen via HTTPS POST requests.

The communication channel with the command and control server is encrypted using the AES-256-CBC algorithm with a key generated based on the unique infection ID. Infected machines regularly query the server, receiving commands to download additional modules, modify registry parameters, or launch remote processes.

Experts advise organizations to rely not only on digital signatures and application appearance, but also to check distribution sources and pay particular attention to programs from new publishers. Behavioral mechanisms that record unexpected Node.js launches, suspicious scheduler activity, or startup entries can provide protection.

Redazione
The editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general.

Lista degli articoli

Source link