AI Research

NYU Tandon researchers develop AI agent that solves cybersecurity challenges autonomously

Published

2 months ago

July 30, 2025

Newswise — Artificial intelligence agents — AI systems that can work independently toward specific goals without constant human guidance — have demonstrated strong capabilities in software development and web navigation. Their effectiveness in cybersecurity has remained limited, however.

That may soon change, thanks to a research team from NYU Tandon School of Engineering, NYU Abu Dhabi and other universities that developed an AI agent capable of autonomously solving complex cybersecurity challenges.

The system, called EnIGMA, was presented this month at the International Conference on Machine Learning (ICML) 2025 in Vancouver, Canada.

“EnIGMA is about using Large Language Model agents for cybersecurity applications,” said Meet Udeshi, a NYU Tandon Ph.D. student and co-author of the research. Udeshi is advised by Ramesh Karri, Chair of NYU Tandon’s Electrical and Computer Engineering Department (ECE) and a faculty member of the NYU Center for Cybersecurity and NYU Center for Advanced Technology in Telecommunications (CATT), and by Farshad Khorrami, ECE professor and CATT faculty member. Both Karri and Khorrami are co-authors on the paper, with Karri serving as a senior author.

To build EnIGMA, the researchers started with an existing framework called SWE-agent, which was originally designed for software engineering tasks. However, cybersecurity challenges required specialized tools that didn’t exist in previous AI systems. “We have to restructure those interfaces to feed it into an LLM properly. So we’ve done that for a couple of cybersecurity tools,” Udeshi explained.

The key innovation was developing what they call “Interactive Agent Tools” that convert visual cybersecurity programs into text-based formats the AI can understand. Traditional cybersecurity tools like debuggers and network analyzers use graphical interfaces with clickable buttons, visual displays, and interactive elements that humans can see and manipulate.

“Large language models process text only, but these interactive tools with graphical user interfaces work differently, so we had to restructure those interfaces to work with LLMs,” Udeshi said.

The team built their own dataset by collecting and structuring Capture The Flag (CTF) challenges specifically for large language models. These gamified cybersecurity competitions simulate real-world vulnerabilities and have traditionally been used to train human cybersecurity professionals.

“CTFs are like a gamified version of cybersecurity used in academic competitions. They’re not true cybersecurity problems that you would face in the real world, but they are very good simulations,” Udeshi noted.

Paper co-author Minghao Shao, a NYU Tandon Ph.D. student and Global Ph.D. Fellow at NYU Abu Dhabi who is advised by Karri and Muhammad Shafique, Professor of Computer Engineering at NYU Abu Dhabi and ECE Global Network Professor at NYU Tandon, described the technical architecture: “We built our own CTF benchmark dataset and created a specialized data loading system to feed these challenges into the model.” Shafique is also a co-author on the paper.

The framework includes specialized prompts that provide the model with instructions tailored to cybersecurity scenarios.

EnIGMA demonstrated superior performance across multiple benchmarks. The system was tested on 390 CTF challenges across four different benchmarks, achieving state-of-the-art results and solving more than three times as many challenges as previous AI agents.

During the research conducted approximately 12 months ago, “Claude 3.5 Sonnet from Anthropic was the best model, and GPT-4o was second at that time,” according to Udeshi.

The research also identified a previously unknown phenomenon called “soliloquizing,” where the AI model generates hallucinated observations without actually interacting with the environment, a discovery that could have important consequences for AI safety and reliability.

Beyond this technical finding, the potential applications extend outside of academic competitions. “If you think of an autonomous LLM agent that can solve these CTFs, that agent has substantial cybersecurity skills that you can use for other cybersecurity tasks as well,” Udeshi explained. The agent could potentially be applied to real-world vulnerability assessment, with the ability to “try hundreds of different approaches” autonomously.

The researchers acknowledge the dual-use nature of their technology. While EnIGMA could help security professionals identify and patch vulnerabilities more efficiently, it could also potentially be misused for malicious purposes. The team has notified representatives from major AI companies including Meta, Anthropic, and OpenAI about their results.

In addition to Karri, Khorrami, Shafique, Udeshi and Shao, the paper’s authors are Talor Abramovich (Tel Aviv University), Kilian Lieret (Princeton University), Haoran Xi (NYU Tandon), Kimberly Milner (NYU Tandon), Sofija Jancheska (NYU Tandon), John Yang (Stanford University), Carlos E. Jimenez (Princeton University), Prashanth Krishnamurthy (NYU Tandon), Brendan Dolan-Gavitt (NYU Tandon), Karthik Narasimhan (Princeton University), and Ofir Press (Princeton University).

Funding for the research came from Open Philanthropy, Oracle, the National Science Foundation, the Army Research Office, the Department of Energy, and NYU Abu Dhabi Center for Cybersecurity and Center for Artificial Intelligence and Robotics.

Source link

Up Next

The AI disruption: From global business to your breakfast table

Don't Miss

Zuckerberg shares AI superintelligence vision after spending spree

The Editors

Click to comment

AI Research

Google-owner reveals £5bn AI investment in UK ahead of Trump visit

Published

32 minutes ago

September 16, 2025

Faisal Islam

The world’s fourth biggest company, Google-owner Alphabet, has announced a new £5bn ($6.8bn) investment in UK artificial intelligence (AI).

The money will be used for infrastructure and scientific research over the next two years – the first of several massive US investments being unveiled ahead of US President Donald Trump’s state visit.

Google’s President and Chief Investment Officer Ruth Porat told BBC News in an exclusive interview that there were “profound opportunities in the UK” for its “pioneering work in advanced science”.

The company will officially open a vast $1bn (£735m) data centre in Waltham Cross, Hertfordshire, with Chancellor Rachel Reeves on Tuesday.

The investment will expand this site and also include funding for London-based DeepMind, run by British Nobel Prize winner Sir Demis Hassabis, which deploys AI to revolutionise advanced scientific research.

Ms Porat said there was “now a US-UK special technology relationship… there’s downside risks that we need to work on together to mitigate, but there’s also tremendous opportunity in economic growth, in social services, advancing science”.

She pointed to the government’s AI Opportunities Action Plan as helping the investment, but said “there’s still work to be done to land that”, and that capturing the upside of the AI boom “was not a foregone conclusion”.

The US administration had pressed the UK to water down its Digital Services Tax on companies, including Google, in talks this year, but it is not expected to feature in this week’s announcements.

Further multi-billion-dollar UK investments are expected from US giants over the next 24 hours.

The pound has strengthened, analysts say, partly on expectations of interest rate changes and a flow of US investment.

Yesterday, Google’s owner Alphabet became the fourth company to be worth more than $3tn in terms of total stock market value, joining other technology giants Nvidia, Microsoft and Meta.

Google’s share price has surged in the past month after US courts decided not to order the breakup of the company.

Google CEO Sundar Pichai had succeeded in making the company an “AI First” business, saying “it’s that performance which has resulted in that metric”, Ms Porat said.

Until this summer, Google had been seen to have lagged behind startups such as OpenAI, despite having pioneered much of the key research behind large language models.

Across the world, there has been some concern about the energy use and environmental impact of data centres.

Ms Porat said that the facility would be air-cooled rather than water-cooled and the heat “captured and redeployed to heat schools and homes”.

Google signed a deal with Shell to supply “95% carbon-free energy” for its UK investments.

In the US, the Trump administration has suggested that the power needs of AI data centres require a return to the use of carbon-intensive energy sources.

Ms Porat said that Google remained committed to building our renewable energy, but “obviously wind doesn’t blow and the sun doesn’t shine every hour of the day”.

Energy efficiency was being built into “all aspects of AI” microchips, models, and data centres, but it was important to “modernise the grid” to balance off periods of excess capacity, she said.

Asked about fears of an AI-induced graduate jobs crisis, Ms Porat also said that her company was “spending a lot of time” focused on the AI jobs challenge.

“It would be naive to assume that there isn’t a downside… If companies just use AI to find efficiencies, we’re not going to see the upside to the UK economy or any economy.”

But, she said, entire new industries were being created, opening new doors, and in jobs such as nursing and radiology, adding: “AI is collaborating with people rather than replacing them.”

“Each one of us needs to start using AI so you can understand how it can be an assistance to what you’re doing, as opposed to actually fearing it and watching from the sidelines,” she said.

Source link

AI Research

Trading Central Launches FIBI: AI-Powered Financial

Published

4 hours ago

September 16, 2025

Trading Central

OTTAWA, CANADA, Sept. 15, 2025 (GLOBE NEWSWIRE) — Trading Central, a pioneer in financial market research and insights, announced the launch of FIBI, AI Assistant, across its suite of research tools: Technical Insight®, TC Options Insight™, TC Fundamental Insight®, and TC Market Buzz®.

FIBI™ (‘Financial Insight Bot Interface’) leverages Trading Central’s proprietary natural language processing (NLP), language model (LM), and generative AI (GenAI) technologies—trained by the company’s award-winning data scientists and financial analysts. These models are grounded in deep expertise across technical and fundamental analysis, options trading, and market behavior.

FIBI sets itself apart from generic AI and chatbots with actionable and compliance-friendly market insights powered by high-quality, real-time data. Its natural language storytelling and progressive disclosure of key insights ensure that investors of all skill-levels benefit from quality analysis without the information overload.

“FIBI represents the next generation of investor enablement,” said Alain Pellier, CEO of Trading Central. “In a world flooded with generic AI content, FIBI offers a focused, trustworthy experience that’s built for action.”

With FIBI, brokers can deliver a differentiated client experience — empowering investors with a tool that feels insightful, approachable and personalized, while strengthening trust in their research offering.

FIBI continues Trading Central’s mission to empower investors worldwide, bridging the gap between sophisticated analysis and actionable insights.

Contact Trading Central today to book your demo at sales@tradingcentral.com.

About Trading Central

Since 1999, Trading Central has empowered investors to make confident decisions with actionable, award-winning research. By combining expert insights with modern data visualizations, Trading Central helps investors discover trade ideas, manage risk, and identify new opportunities. Its flexible tools are designed for seamless integration across desktop and mobile platforms via iFrames, APIs, and widgets.

Media Contact

Brand: Trading Central

Melissa Dettorre, Marketing Manager

Email: marketing@tradingcentral.com

Website: https://www.tradingcentral.com

Source link

AI Research

Here’s how people are actually using ChatGPT

Published

5 hours ago

September 15, 2025

Jon Keegan

OpenAI has released its largest report yet on how real people are actually using ChatGPT. The fascinating working research paper, published by the National Bureau of Economic Research, describes a wide-ranging study that used AI to analyze 1 million chat transcripts (no humans read any of the chats). The study has not been peer reviewed.

Some of the big takeaways from the paper:

💃 70% (!!!) of all queries were not related to work. That number may send a chill down the spine of Big Tech, as it’s betting on enterprise AI to generate enough revenue to justify the hundreds of billions it’s spending to build out AI infrastructure.
📝 Among work-related messages, the most common use for ChatGPT is writing, and mostly just to modify or improve a user’s text. Writing queries made up 42% of work-related messages and 52% of all messages from users who work in business and management.
🙋🏻 About half (49%) of all queries were classified as “asking” — for guidance, advice, or information. 40% of messages were requests classified as “doing,” or asking the chatbot to complete a task.
👩‍💻 Female users contributed more than half of all queries, as of July 2025. This is a massive shift from early on, when the vast majority of users were male. But it’s worth noting that the study determined this by classifying first names as masculine or feminine.
🛹 The youth loves AI. Half of all messages were from adults under 26.

The OpenAI researchers took a random sample of about 1 million messages between May 2024 and June 2025 from logged-in, adult ChatGPT users (who did not opt out of sharing their messages for training).

ChatGPT usage - Breakdown of tasks by topic. — Breakdown of tasks by topic (Chart: OpenAI/NBER)

This study is one of the largest surveys of real-world AI use, so this data will be of great interest to all the companies trying to figure out how they’re going to make money selling AI services.

One thing that stood out was how utilitarian the usage of AI was. Rather than falling in love with an AI chatbot or having deep conversations with your new AI buddy, it looks like people are just using it to make their work better and figure things out.

It remains to be seen how AI will end up being part of our everyday lives, but it might look a lot more boring than Silicon Valley is making it out to be.

Source link