AI Research

Project Ire autonomously identifies malware at scale

Published

2 months ago

July 28, 2025

Alyssa Hughes (2ADAPTIVE LLC dba 2A Consulting)

Today, we are excited to introduce an autonomous AI agent that can analyze and classify software without assistance, a step forward in cybersecurity and malware detection. The prototype, Project Ire, automates what is considered the gold standard in malware classification: fully reverse engineering a software file without any clues about its origin or purpose. It uses decompilers and other tools, reviews their output, and determines whether the software is malicious or benign.

Project Ire emerged from a collaboration between Microsoft Research, Microsoft Defender Research, and Microsoft Discovery & Quantum, bringing together security expertise, operational knowledge, data from global malware telemetry, and AI research. It is built on the same collaborative and agentic foundation behind GraphRAG (opens in new tab) and Microsoft Discovery (opens in new tab). The system uses advanced language models and a suite of callable reverse engineering and binary analysis tools to drive investigation and adjudication.

As of this writing, Project Ire has achieved a precision (opens in new tab) of 0.98 and a recall (opens in new tab) of 0.83 using public datasets of Windows drivers. It was the first reverse engineer at Microsoft, human or machine, to author a conviction case—a detection strong enough to justify automatic blocking—for a specific advanced persistent threat (APT) malware sample, which has since been identified and blocked by Microsoft Defender.

Malware classification at a global scale

Microsoft’s Defender platform scans more than one billion monthly (opens in new tab) active devices through the company’s Defender suite of products, which routinely require manual review of software by experts.

This kind of work is challenging. Analysts often face error and alert fatigue, and there’s no easy way to compare and standardize how different people review and classify threats over time. For both of these reasons, today’s overloaded experts are vulnerable to burnout, a well-documented issue in the field.

Unlike other AI applications in security, malware classification lacks a computable validator (opens in new tab). The AI must make judgment calls without definitive validation beyond expert review. Many behaviors found in software, like reverse engineering protections, don’t clearly indicate whether a sample is malicious or benign.

This ambiguity requires analysts to investigate each sample incrementally, building enough evidence to determine whether it’s malicious or benign despite opposition from adaptive, active adversaries. This has long made it difficult to automate and scale what is inherently a complex and expensive process.

Technical foundation

Project Ire attempts to address these challenges by acting as an autonomous system that uses specialized tools to reverse engineer software. The system’s architecture allows for reasoning at multiple levels, from low-level binary analysis to control flow reconstruction and high-level interpretation of code behavior.

Its tool-use API enables the system to update its understanding of a file using a wide range of reverse engineering tools, including Microsoft memory analysis sandboxes based on Project Freta (opens in new tab), custom and open-source tools, documentation search, and multiple decompilers.

Reaching a verdict

The evaluation process begins with a triage, where automated reverse engineering tools identify the file type, its structure, and potential areas of interest. From there, the system reconstructs the software’s control flow graph using frameworks such as angr (opens in new tab) and Ghidra (opens in new tab), building a graph that forms the backbone of Project Ire’s memory model and guides the rest of the analysis.

Through iterative function analysis, the LLM calls specialized tools through an API to identify and summarize key functions. Each result feeds into a “chain of evidence,” a detailed, auditable trail that shows how the system reached its conclusion. This traceable evidence log supports secondary review by security teams and helps refine the system in cases of misclassification.

To verify its findings, Project Ire can invoke a validator tool that cross-checks claims in the report against the chain of evidence. This tool draws on expert statements from malware reverse engineers on the Project Ire team. Drawing on this evidence and its internal model, the system creates a final report and classifies the sample as malicious or benign.

Preliminary testing shows promise

Two early evaluations tested Project Ire’s effectiveness as an autonomous malware classifier. In the first, we assessed Project Ire on a dataset of publicly accessible Windows drivers, some known to be malicious, others benign. Malicious samples came from the Living off the Land Drivers (opens in new tab) database, which includes a collection of Windows drivers used by attackers to bypass security controls, while known benign drivers were sourced from Windows Update.

This classifier performed well, correctly identifying 90% of all files and flagging only 2% of benign files as threats. It achieved a precision of 0.98 and a recall of 0.83. This low false-positive rate suggests clear potential for deployment in security operations, alongside expert reverse engineering reviews.

For each file it analyzes, Project Ire generates a report that includes an evidence section, summaries of all examined code functions, and other technical artifacts.

Figures 1 and 2 present reports for two successful malware classification cases generated during testing. The first involves a kernel-level rootkit, Trojan:Win64/Rootkit.EH!MTB (opens in new tab). The system identified several key features, including jump-hooking, process termination, and web-based command and control. It then correctly flagged the sample as malicious.

Figure 1 Analysis

The binary contains a function named ‘MonitorAndTerminateExplorerThread_16f64’ that runs an infinite loop waiting on synchronization objects and terminates system threads upon certain conditions. It queries system or process information, iterates over processes comparing their names case-insensitively to ‘Explorer.exe’, and manipulates registry values related to ‘Explorer.exe’. This function appears to monitor and potentially terminate or manipulate the ‘Explorer.exe’ process, a critical Windows shell process. Such behavior is suspicious and consistent with malware that aims to disrupt or control system processes.

Another function, ‘HttpGetRequestAndResponse_174a4’, performs HTTP GET requests by parsing URLs, resolving hostnames, opening sockets, sending requests, and reading responses. This network communication capability could be leveraged for command and control or data exfiltration, common in malware.

The binary also includes a function ‘PatchProcessEntryPointWithHook_12b5c’ that patches the entry point of a process by writing a hook or trampoline that redirects execution to a specified address. This technique is commonly used for process injection or hooking, allowing malware to alter process behavior or inject malicious code.

Other functions related to sending IOCTL requests to device drivers were identified, but their maliciousness could not be conclusively determined without additional context.

Overall, the binary exhibits multiple indicators of malicious behavior, including process manipulation, network communication, and code injection techniques, suggesting it is likely malware designed to interfere with system processes and communicate with remote servers.

Figure 1. Project Ire report, sample with SHA256: 86047bb1969d1db455493955fd450d18c62a3f36294d0a6c3732c88dfbcc4f62

The second sample, HackTool:Win64/KillAV!MTB (opens in new tab), was designed to disable antivirus software. Project Ire correctly identified the code that locates and disables antivirus programs, providing evidence that the file was malicious.

In one section of the code, however, the system misidentified a function as anti-debugging behavior. To maintain accuracy, the system used the validator tool to flag the claim as unsupported. The issue was later resolved by updating decompiler rules, but this example illustrates how Project Ire navigates uncertainty during analysis. Figure 2 shows the corresponding report.

Figure 2 Analysis

The binary contains several functions indicative of malicious intent. The function register_and_log_known_processes_140001000 logs and registers process names associated with antivirus and security software, such as ‘avp.exe’, ‘avpui.exe’, and ‘360Tray.exe’. It calls another function, TerminateProcessesByNameSubstring_1400010f4, which enumerates system processes and terminates those whose names contain specified substrings. This behavior is typical of malware attempting to disable or evade security software by killing their processes.

Another function, check_and_handle_special_state_14000502c, performs checks on a global variable and triggers software interrupts if certain conditions are not met. While the exact purpose of these interrupts (int 0x29 and int 0x3) is unclear, they could represent an anti-debug or anti-analysis mechanism to detect or interfere with debugging or tampering attempts. However, this assumption could not be fully validated against expert statements.

Other functions include initialization routines and simple logging wrappers, but the core malicious behavior centers on process termination targeting security software. This indicates the binary is designed to compromise system security by disabling protective processes, a hallmark of malware such as trojans or rootkits.

Figure 2. Project Ire report, sample with SHA256: b6cb163089f665c05d607a465f1b6272cdd5c949772ab9ce7227120cf61f971a

Real-world evaluation with Microsoft Defender

The more demanding test involved nearly 4,000 “hard-target” files not classified by automated systems and slated for manual review by expert reverse engineers.

In this real-world scenario, Project Ire operated fully autonomously on files created after the language models’ training cutoff, files that no other automated tools at Microsoft could classify at the time.

The system achieved a high precision score of 0.89, meaning nearly 9 out of 10 files flagged malicious were correctly identified as malicious. Recall was 0.26, indicating that under these challenging conditions, the system detected roughly a quarter of all actual malware.

The system correctly identified many of the malicious files, with few false alarms, just a 4% false positive rate. While overall performance was moderate, this combination of accuracy and a low error rate suggests real potential for future deployment.

Looking ahead

Based on these early successes, the Project Ire prototype will be leveraged inside Microsoft’s Defender organization as Binary Analyzer for threat detection and software classification.

Our goal is to scale the system’s speed and accuracy so that it can correctly classify files from any source, even on first encounter. Ultimately, our vision is to detect novel malware directly in memory, at scale.

Acknowledgements

Project Ire acknowledges the following additional developers that contributed to the results in this publication: Dayenne de Souza, Raghav Pande, Ryan Terry, Shauharda Khadka, and Bob Fleck for their independent review of the system.

The system incorporates multiple tools, including the angr framework developed by Emotion Labs (opens in new tab). Microsoft has collaborated extensively with Emotion Labs, a pioneer in cyber autonomy, throughout the development of Project Ire, and thanks them for the innovations and insights that contributed to the successes reported here.

Source link

AI Research

Should You Forget Nvidia and Buy These 2 Artificial Intelligence (AI) Stocks Instead?

Published

51 minutes ago

September 12, 2025

Geoffrey Seiler

Both AMD and Broadcom have an opportunity to outperform in the coming years.

Nvidia is the king of artificial intelligence (AI) infrastructure, and for good reason. Its graphics processing units (GPUs) have become the main chips for training large language models (LLMs), and its CUDA software platform and NVLink interconnect system, which helps its GPUs act like a single chip, have helped create a wide moat.

Nvidia has grown to become the largest company in the world, with a market cap of over $4 trillion. In Q2, it held a whopping 94% market share for GPUs and saw its data center revenue soar 56% to $41.1 billion. That’s impressive, but those large numbers may be why there could be some better opportunities in the space.

Two stocks to take a closer look at are Advanced Micro Devices (AMD 1.91%) and Broadcom (AVGO 0.19%). Both are smaller players in AI chips, and as the market shifts from training toward inference, they’re both well positioned. The reality is that while large cloud computing and other hyperscalers (companies with large data centers) love Nvidia’s GPUs they would prefer more alternatives to help reduce costs and diversify their supply chains.

1. AMD

AMD is a distant second to Nvidia in the GPU market, but the shift to inference should help it. Training is Nvidia’s stronghold, and where its CUDA moat is strongest. However, inference is where demand is accelerating, and AMD has already started to win customers.

AMD management has said one of the largest AI model operators in the world is using its GPUs for a sizable portion of daily inference workloads and that seven of the 10 largest AI model companies use its GPUs. That’s important because inference isn’t a one-time event like training. Every time someone asks a model a question or gets a recommendation, GPUs are providing the power for these models to get the answer. That’s why cost efficiency matters more than raw peak performance.

That’s exactly where AMD has a shot to take some market share. Inference doesn’t need the same libraries and tools as training, and AMD’s ROCm software platform is more than capable of handling inference workloads. And once performance is comparable, price becomes more of a deciding factor.

AMD doesn’t need to take a big bite out of Nvidia’s share to move the needle. Nvidia just posted $41.1 billion in data center revenue last quarter, while AMD came in at $3.2 billion. Even small wins can have an outsize impact when you start from a base that is a fraction of the size of the market leader.

On top of that, AMD helped launch the UALink Consortium, which includes Broadcom and Intel, to create an open interconnect standard that competes with Nvidia’s proprietary NVLink. If successful, that would break down one of Nvidia’s big advantages and allow customers to build data center clusters with chips from multiple vendors. That’s a long-term effort, but it could help improve the playing field.

With inference expected to become larger than training over time, AMD doesn’t need to beat Nvidia to deliver strong returns; it just needs a little bigger share.

Image source: Getty Images.

2. Broadcom

Broadcom is attacking the AI opportunity from another angle, but the upside may be even more compelling. Instead of designing off-the-shelf GPUs, Broadcom is helping customers make their own customer AI chips.

Broadcom is a leader in helping design application-specific integrated circuits, or ASICs, and it has taken that expertise and applied it to making custom AI chips. Its first customer was Alphabet, which it helped design its highly successful Tensor Processing Units (TPUs) that now help power Google Cloud. This success led to other design wins, including with Meta Platforms and TikTok owner ByteDance. Combined, Broadcom has said these three customers represent a $60 billion to $90 billion serviceable addressable market by its fiscal 2027 (ending October 2027).

However, the news got even better when the company revealed that a fourth customer, widely believed to be OpenAI, placed a $10 billion order for next year. Designing ASICs is typically not a quick process. Alphabet’s TPUs took about 18 months from start to finish, which at the time was considered quick. But this newest deal shows it can keep this fast pace. This also bodes well with future deals, as late last year it was revealed that Apple will be a fifth customer.

Custom chips have clear advantages for inference. They’re designed for specific workloads, so they deliver better power efficiency and lower costs than off-the-shelf GPUs. As inference demand grows larger than training, Broadcom’s role as the go-to design partner becomes more valuable.

Now, custom chips have large upfront costs to design and aren’t for everyone, but this is a huge potential opportunity for Broadcom moving forward.

The bottom line

Nvidia is still the dominant player in AI infrastructure, and I don’t see that changing anytime soon. However, both AMD and Broadcom have huge opportunities in front of them and are starting at much smaller bases. That could help them outperform in the coming years.

Geoffrey Seiler has positions in Alphabet. The Motley Fool has positions in and recommends Advanced Micro Devices, Alphabet, Apple, Meta Platforms, and Nvidia. The Motley Fool recommends Broadcom. The Motley Fool has a disclosure policy.

Source link

AI Research

Google’s top AI scientist says ‘learning how to learn’ will be next generation’s most needed skill

Published

1 hour ago

September 12, 2025

Derek Gatopoulos, Associated Press

ATHENS – A top Google scientist and 2024 Nobel laureate said Friday that the most important skill for the next generation will be “learning how to learn” to keep pace with change as Artificial Intelligence transforms education and the workplace.

Speaking at an ancient Roman theater at the foot of the Acropolis in Athens, Demis Hassabis, CEO of Google’s DeepMind, said rapid technological change demands a new approach to learning and skill development.

“It’s very hard to predict the future, like 10 years from now, in normal cases. It’s even harder today, given how fast AI is changing, even week by week,” Hassabis told the audience. “The only thing you can say for certain is that huge change is coming.”

The neuroscientist and former chess prodigy said artificial general intelligence — a futuristic vision of machines that are as broadly smart as humans or at least can do many things as well as people can — could arrive within a decade. This, he said, will bring dramatic advances and a possible future of “radical abundance” despite acknowledged risks.

Hassabis emphasized the need for “meta-skills,” such as understanding how to learn and optimizing one’s approach to new subjects, alongside traditional disciplines like math, science and humanities.

“One thing we’ll know for sure is you’re going to have to continually learn … throughout your career,” he said.

The DeepMind co-founder, who established the London-based research lab in 2010 before Google acquired it four years later, shared the 2024 Nobel Prize in chemistry for developing AI systems that accurately predict protein folding — a breakthrough for medicine and drug discovery.

Greek Prime Minister Kyriakos Mitsotakis joined Hassabis at the Athens event after discussing ways to expand AI use in government services. Mitsotakis warned that the continued growth of huge tech companies could create great global financial inequality.

“Unless people actually see benefits, personal benefits, to this (AI) revolution, they will tend to become very skeptical,” he said. “And if they see … obscene wealth being created within very few companies, this is a recipe for significant social unrest.”

Mitsotakis thanked Hassabis, whose father is Greek Cypriot, for rescheduling the presentation to avoid conflicting with the European basketball championship semifinal between Greece and Turkey. Greece later lost the game 94-68.

____

Kelvin Chan in London contributed to this story.

Source link

AI Research

Ray Dalio calls for ‘redistribution policy’ when AI and humanoid robots start to benefit the top 1% to 10% more than everyone else

Published

2 hours ago

September 12, 2025

Nick Lichtenberg

Legendary investor Ray Dalio, founder of Bridgewater Associates, has issued a stark warning regarding the future impact of artificial intelligence (AI) and humanoid robots, predicting a dramatic increase in wealth inequality that will necessitate a new “redistribution policy”. Dalio articulated his concerns, suggesting that these advanced technologies are poised to benefit the top 1% to 10% of the population significantly more than everyone else, potentially leading to profound societal challenges.

Speaking on “The Diary Of A CEO” podcast, Dalio described a future where humanoid robots, smarter than humans, and advanced AI systems, powered by trillions of dollars in investment, could render many current professions obsolete. He questioned the need for lawyers, accountants, and medical professionals if highly intelligent robots with PhD-level knowledge become commonplace, stating, “we will not need a lot of those jobs.” This technological leap, while promising “great advances,” also carries the potential for “great conflicts.”

He predicted “a limited number of winners and a bunch of losers,” with the likely result being much greater polarity. With the top 1% to 10% “benefiting a lot,” he foresees that being a dividing force. He described the current business climate on AI and robotics as a “crazy boom,” but the question that’s really on his mind is: why would you need even a highly skilled professional if there’s a “humanoid robot that is smarter than all of us and has a PhD and everything.” Perhaps surprisingly, the founder of the biggest hedge fund in history suggested that redistribution will be sorely needed.

Five big forces

“There certainly needs to be a redistribution policy,” Dalio told host Steven Bartlett, without directly mentioning universal basic income. He clarified that this will have to more than “just a redistribution of money policy because uselessness and money may not be a great combination.” In other words, if you redistribute money but don’t think about how to put people to work, that could have negative effects in a world of autonomous agents. The ultimate takeaway, Dalio said, is “that has to be figured out, and the question is whether we’re too fragmented to figure that out.”

Dalio’s remarks echo those of computer science professor Roman Yampolskiy, who sees AI creating up to 80 hours of free time per week for most people. But AI is also showing clear signs of shrinking the jobs market for recent grads, with one study seeing a 13% drop in AI-exposed jobs since 2022. Major revisions from the Bureau of Labor Statistics show that AI has begun “automating away tech jobs,” an economist said in a statement to Fortune in early September.

Dalio said he views this technological acceleration as the fifth of five “big forces” that create an approximate 80-year cycle throughout history. He explained that human inventiveness, particularly with new technologies, has consistently raised living standards over time. However, when people don’t believe the system works for them, he said, internal conflicts and “wars between the left and the right” can erupt. Both the U.S. and UK are currently experiencing these kinds of wealth and values gaps, he said, leading to internal conflict and a questioning of democratic systems.

Drawing on his extensive study of history, which spans 500 years and covers the rise and fall of empires, Dalio sees a historical precedent for such transformative shifts. He likened the current era to previous evolutions, from the agricultural age, where people were treated “essentially like oxen,” to the industrial revolutions where machines replaced physical labor. He said he’s concerned about a similar thing with mental labor, as “our best thinking may be totally replaced.” Dalio highlighted that throughout history, “intelligence matters more than anything” as it attracts investment and drives power.

Pessimistic outlook

Despite the “crazy boom” in AI and robotics, Dalio’s outlook on the future of major powers like the UK and U.S. was not optimistic, citing high debt, internal conflict, and geopolitical factors, in addition to a lack of innovative culture and capital markets in some regions. While personally “excited” by the potential of these technologies, Dalio’s ultimate concern rests on “human nature”. He questions whether people can “rise above this” to prioritize the “collective good” and foster “win-win relationships,” or if greed and power hunger will prevail, exacerbating existing geopolitical tensions.

Not all market watchers see a crazy boom as such a good thing. Even OpenAI CEO Sam Alman himself has said it resembles a “bubble” in some respects. Goldman Sachs has calculated that a bubble popping could wipe out up to 20% of the S&P 500’s valuation. And some long-time critics of the current AI landscape, such as Gary Marcus, disagree with Dalio entirely, arguing that the bubble is due to pop because the AI technology currently on the market is too error-prone to be relied upon, and therefore can’t be scaled away. Stanford computer science professor Jure Leskovec told Fortune that AI is a powerful but imperfect tool and it’s boosting “human expertise” in his classroom, including the hand-written and hand-graded exams that he’s using to really test his students’ knowledge.

For this story, Fortune used generative AI to help with an initial draft. An editor verified the accuracy of the information before publishing.

Fortune Global Forum returns Oct. 26–27, 2025 in Riyadh. CEOs and global leaders will gather for a dynamic, invitation-only event shaping the future of business. Apply for an invitation.

Source link