AI Research

Study finds filtered data stops openly-available AI models from performing dangerous tasks

Published

1 month ago

August 11, 2025

Senior author Yarin Gal, Associate Professor of Machine Learning at Oxford’s Department of Computer Science, said: ‘The research community has made great progress with AI safeguards over the past few years, but a remaining massive challenge is safeguarding open weight models – how do we build models that we can distribute to all without raising risks of misuse. Our study makes a significant stride in this direction.’

Embedding safety from the start

This approach represents a shift in the approach to AI safety: rather than retrofitting safeguards, safety is embedded from the start. The method reduces risk without sacrificing openness, enabling transparency and research without compromising security.

The research community has made great progress with AI safeguards over the past few years, but a remaining massive challenge is safeguarding open weight models – how do we build models that we can distribute to all without raising risks of misuse. Our study makes a significant stride in this direction.

Senior author Associate Professor Yarin Gal, Department of Computer Science

Open-weight models are a cornerstone of transparent, collaborative AI research. Their availability promotes red teaming, mitigates market concentration, and accelerates scientific progress. With the recent releases of prominent models like Kimi-K2, GLM-4.5, and gpt-oss, open-weight models are steadily increasing in their capabilities and influence, with capabilities that reportedly lag behind the best closed models by just 6-12 months.

However, openness brings risk. Just as open models can be refined for positive applications, they can also be modified for harm. Modified text models lacking safeguards are already widespread, while open image generators have become tools for producing illegal content. Because these models can be downloaded, altered, and redistributed by anyone, developing robust protections against tampering is critical.

Instead of training a general-purpose model and then adding filters, this work builds safeguards throughout the entire training process by filtering unwanted knowledge from the training data. The team focused on a biothreat setting and filtered biology-related content from the model’s training data, aiming to deny the model this knowledge entirely, rather than suppressing it post hoc which can often be reversed easily.

The filtered model was able to resist training on up to 25,000 papers on biothreat-related topics (such as virology, bioweapons, reverse genetics, and viral vectors), proving over ten times more effective than prior state-of-the-art methods. Unlike traditional fine-tuning or access-limiting strategies, which can often be bypassed, filtering pretraining data proved resilient even under sustained adversarial attack—surviving 10,000 steps and over 300 million tokens of targeted fine-tuning.

How the method works:

By removing the unwanted knowledge from the start, the resulting model had no basis for acquiring dangerous capabilities, even after further training attempts. Our study therefore shows that data filtration can be a powerful tool in helping developers balance safety and innovation in open-source AI.

Study co-author Stephen Casper, UK AI Security Institute.

The team used a multi-stage filtering pipeline combining keyword blocklists and a machine-learning classifier trained to detect high-risk content. This allowed them to remove only the relevant materials—around 8–9% of the dataset—while preserving the breadth and depth of general information.

They then trained AI models from scratch using this filtered data, benchmarking them against both unfiltered models and models using state-of-the-art safety fine-tuning methods. Across evaluations, the filtered models performed just as well on standard tasks—like commonsense reasoning and scientific Q&A.

A major advance for global AI governance

The findings come at a critical moment for global AI governance. Several recent AI safety reports from OpenAI, Anthropic and DeepMind have warned that frontier models may soon be able to assist with the creation of biological or chemical threats. Many governments have expressed concern about the lack of safeguards for openly available models, which cannot be recalled once released.

Study co-author Stephen Casper (UK AI Security Institute) said: ‘By removing the unwanted knowledge from the start, the resulting model had no basis for acquiring dangerous capabilities, even after further training attempts. Our study therefore shows that data filtration can be a powerful tool in helping developers balance safety and innovation in open-source AI.’

This research was conducted by the University of Oxford, EleutherAI,and the UK AI Security Institute.

The study ‘Deep Ignorance: Filtering pretraining data builds tamper-resistant safeguards into open-weight LLMs’ has been published as a preprint on arXiv.

Source link

AI Research

Panelists Will Question Who Controls AI | ACS CC News

Published

34 minutes ago

September 16, 2025

The Editors

Artificial intelligence (AI) has become one of the fastest-growing technologies in the world today. In many industries, individuals and organizations are racing to better understand AI and incorporate it into their work. Surgery is no exception, and that is why Clinical Congress 2025 has made AI one of the six themes of its Opening Day Thematic Sessions.

The first full day of the conference, Sunday, October 5, will include two back-to-back Panel Sessions on AI. The first session, “Using ChatGPT and AI for Beginners” (PS104), offers a foundation for surgeons not yet well versed in AI. The second, “AI: Who Is In Control?” (PS 110), will offer insights into the potential upsides and drawbacks of AI use, as well as its limitations and possible future applications, so that surgeons can involve this technology in their clinical care safely and effectively.

“AI: Who Is In Control?” will be moderated by Anna N. Miller, MD, FACS, an orthopaedic surgeon at Dartmouth Hitchcock Medical Center in Lebanon, New Hampshire, and Gabriel Brat, MD, MPH, MSc, FACS, a trauma and acute care surgeon at Beth Israel Deaconess Medical Center and an assistant professor at Harvard Medical School, both in Boston, Massachusetts.

In an interview, Dr. Brat shared his view that the use of AI is not likely to replace surgeons or decrease the need for surgical skills or decision-making. “It’s not an algorithm that’s going to be throwing the stitch. It’s still the surgeon.”

Nonetheless, he said that the starting presumption of the session is that AI is likely to be highly transformative to the profession over time.

“Once it has significant uptake, it’ll really change elements of how we think about surgery,” he said, including creating meaningful opportunities for improvements.

The key question of the session, therefore, is not whether to engage with AI, but to do so in ways that ensure the best outcomes: “We as surgeons need to have a role in defining how to do so safely and effectively. Otherwise, people will start to use these tools, and we will be swept along with a movement as opposed to controlling it.”

To that end, Dr. Brat explained that the session will offer “a really strong translational focus by people who have been in the trenches working with these technologies.” He and Dr. Miller have specifically chosen an “all-star panel” designed to represent academia, healthcare associations, and industry.

The panelists include Rachael A. Callcut, MD, MSPH, FACS, who is the division chief of trauma, acute care surgery and surgical critical care as well as associate dean of data science and innovation at the University of California-Davis Health in Sacramento, California. She will share the perspective on AI from academic surgery.

Genevieve Melton-Meaux, MD, PhD, FACS, FACMI, the inaugural ACS Chief Health Informatics Officer, will present on AI usage in healthcare associations. She also is a colorectal surgeon and the senior associate dean for health informatics and data science at the University of Minnesota and chief health informatics and AI officer for Fairview Health Services, both in Minneapolis.

Finally, Khan Siddiqui, MD, a radiologist and serial entrepreneur who is the cofounder, chairman, and CEO of a company called HOPPR AI, will present the view from industry. HOPPR AI is a for-profit company focused on building AI apps for medical imaging. As a radiologist, Dr. Siddiqui represents a medical specialty that is thought to likely undergo sweeping change as AI is incorporated into image-reading and diagnosis. His comments will focus on professional insights relevant to surgeons.

Their presentations will provide insights on general usage of AI at present, as well as predictions on what the landscape for AI in healthcare will look like in approximately 5 years. The session will include advice on what approaches to AI may be most effective for surgeons interested in ensuring positive outcomes and avoiding negative ones.

Additional information on AI usage pervades Clinical Congress 2025. In addition to various sessions that will comment on AI throughout the 4 days of the conference, various researchers will present studies that involve AI in their methods, starting presumptions, and/or potential applications to practice.

Access the Interactive Program Planner for more details about Clinical Congress 2025 sessions.

Source link

AI Research

Our new study found AI is wreaking havoc on uni assessments. Here’s how we should respond

Published

44 minutes ago

September 16, 2025

David Boud

Artificial intelligence (AI) is wrecking havoc on university assessments and exams.

Thanks to generative AI tools, such as ChatGPT, students can now generate essays and assessment answers in seconds. As we have noted in a study earlier this year, this has left universities scrambling to redesign tasks, update policies, and adopt new cheating detection systems.

But the technology keeps changing as they do this, there are constant reports of students cheating their way through their degrees.

The AI and assessment problem has put enormous pressure on institutions and teachers. Today’s students need assessment tasks to complete, as well as confidence the work they are doing matters. The community and employers need assurance university degrees are worth something.

In our latest research, we argue the problem of AI and assessment is far more difficult even than media debates have been making out.

It’s not something that can just be fixed once we find the “correct solution”. Instead, the sector needs to recognise AI in assessment is an intractable “wicked” problem, and respond accordingly.

What is a wicked problem?

The term “wicked problem,” was made famous by theorists Horst Rittel and Melvin Webber in the 1970s. It describes problems that defy neat solutions.

Well-known examples include climate change, urban planning and healthcare reform.

Unlike “tame” problems, which can be solved with enough time and resources, wicked problems have no single correct answer. In fact there is no “true” or “false” answer, only better or worse ones.

Wicked problems are messy, interconnected and resistant to closure. There is no way to test the solution to a wicked problem. Attempts to “fix” the issue inevitably generate new tensions, trade-offs and unintended consequences.

However, admitting there are no “correct” solutions does not mean there are not better and worse ones. Rather, it allows us the space to appreciate the nature and necessity of the trade offs involved.

Our research

In our latest research, we interviewed 20 university teachers leading assessment design work at Australian universities.

We recruited participants by asking for referrals across four faculties at a large Australian university.

We wanted to speak to teachers who had made changes to their assessments because of generative AI. Our aim was to better understand what assessment choices were being made, and what challenges teachers were facing.

When we were setting up our research we didn’t necessarily think of AI and assessment as a “wicked problem”. But this is what emerged from the interviews.

Our results

Interviewees described dealing with AI as an impossible situation, characterised by trade-offs. As one teacher explained:

We can make assessments more AI-proof, but if we make them too rigid, we just test compliance rather than creativity.

In other words, the solution to the problem was not “true or false”, only better or worse.

Or as another teacher asked:

Have I struck the right balance? I don’t know.

There were other examples of imperfect trade-offs. Should assessments allow students to use AI (like they will in the real world)? Or totally exclude it to ensure they demonstrate independent capability?

Should teachers set more oral exams – which appear more AI resistant than other assessments – even if this increases workload and disadvantages certain groups?

As one teacher explained,

250 students by […] 10 min […] it’s like 2,500 min, and then that’s how many days of work is it just to administer one assessment?

Teachers could also set in-person hand-written exams, but this does not necessarily test other skills students need for the real world. Nor can this be done for every single assessment in a course.

The problem keeps shifting

Meanwhile, teachers are expected to redesign assessments immediately, while the technology itself keeps changing. GenAI tools such as ChatGPT are constantly releasing new models, as well as new functionalities, while new AI learning tools (such as AI text summarisers for unit readings) are increasingly ubiquitous.

At the same time, educators need to keep up with all their usual teaching responsibilities (where we know they are already stressed and stretched).

This is a sign of a messy problem, which has no closure or end point. Or as one interviewee explained:

We just do not have the resources to be able to detect everything and then to write up any breaches.

What do we need to do instead?

The first step is to stop pretending AI in assessment is a simple, “solvable” problem.

This not only fails to understand what’s going on, it can also lead to paralysis, stress, burnout and trauma among educators, and policy churn as institutions keep trying one “solution” after the next.

Instead, AI and assessment must be treated as something to be continually negotiated rather than definitively resolved.

This recognition can lift a burden from teachers. Instead of chasing the illusion of a perfect fix, institutions and educators can focus on building processes that are flexible and transparent about the trade-offs involved.

Our study suggests universities give teaching staff certain “permissions” to better address AI.

This includes the ability to compromise to find the best approach for their particular assessment, unit and group of students. All potential solutions will have trade offs – oral examinations might be better at assuring learning but may also bias against certain groups, for example, those whose second language is English.

Perhaps it also means teachers don’t have time for other course components and this might be OK.

But, like so many of the trade offs involved in this problem, the weight of responsibility for making the call will rest on the shoulders of teachers. They need our support to make sure the weight doesn’t crush them.

Source link

AI Research

Stony Brook University Receives $13.77M NSF Grant to Deploy a National Supercomputer to Democratize Access to Artificial Intelligence and Research Computing

Published

56 minutes ago

September 16, 2025

Joan Behan-Duncan

Grant Includes Collaboration with the University at Buffalo

Professor Robert Harrison

STONY BROOK, NY – September 16, 2025 – The U.S. National Science Foundation (NSF) has awarded a $13.77 million grant to Stony Brook University’s Institute for Advanced Computational Science (IACS), in collaboration with the University at Buffalo. The award titled, Sustainable Cyber-infrastructure for Expanding Participation, will deliver cutting-edge computing and data resources to power advanced research nationwide.

This funding will be used to procure and operate a high-performance, highly energy-efficient computer designed to handle the growing needs of artificial intelligence research and other scientific fields that require large amounts of memory and computing power. By making this resource widely available to researchers, students, and educators across the country, the project will expand access to advanced tools, support groundbreaking discoveries, and train the next generation of scientists.

The new system will utilize low-cost and low-energy AmpereOne® M Advanced Reduced Instruction Set Computer (RISC) Machine processors that are designed to excel in artificial intelligence (AI) inference and imperfectly optimized workloads that presently characterize much of academic research computing. Multiple Qualcomm® Cloud AI inference accelerators will also increase energy efficiency, enabling the use of the largest AI models. The AmpereOne® M processors, in combination with the efficient generative AI inference performance and large memory capacity of the Qualcomm Cloud AI inference accelerators, will directly advance the mission of the NSF-led National Artificial Intelligence Research Resource (NAIRR).

This is the first deployment in academia of both of these technologies that have transformed computing in the commercial cloud. The new IACS-led supercomputer will efficiently execute diverse workloads in an energy- and cost-efficient manner, providing easily accessible, competitive and consistent performance without requiring sophisticated programming skills or knowledge of advanced hardware features.

“This project employs a comprehensive, multilayered strategy, with regional and national elements to ensure the widest possible benefits,” said IACS director Robert J. Harrison. “The team will collaborate with multiple initiatives and projects, to reach a broad audience that spans all experience levels from high school students beginning to explore science and technology to faculty members advancing innovation through scholarship and teaching.”

“The University at Buffalo is excited to partner with Stony Brook on this new project that will advance research, innovation and education by expanding the nation’s cyber-infrastructure to scientific disciplines that were not high performance computing-heavy prior to the AI boom, as well as expanding to non-R1 universities, which also didn’t have much of high-performance computing usage in the past,” says co-principal investigator Nikolay Simakov, a computational scientist at the University at Buffalo Center for Computational Research.

“AmpereOne® M delivers the performance, memory and energy footprint required for modern research workloads—helping democratize access to AI and data-driven science by lowering the barriers to large-scale compute,” said Jeff Wittich, Chief Product Officer at Ampere. “We look forward to working

with Stony Brook University to integrate this platform into research and education programs, accelerating discoveries in genomics, bioinformatics and AI.”

“Qualcomm Technologies is proud to contribute our expertise in high-performance, energy-efficient AI inference and scalable Qualcomm Cloud AI Inference solutions to this groundbreaking initiative,” said Dr. Richard Lethin, VP, Engineering, Qualcomm Technologies, Inc. “Our technologies enable seamless integration into a wide range of applications, enabling researchers and students to easily leverage advanced AI capabilities.”

Nationally and regionally, this funding will support a variety of projects, with an emphasis on fields of research that are not targeted by other national resources (e.g., life sciences and computational linguistics). In particular, the AmpereOne® M system will excel on high-throughput workloads common to genomics and bioinformatics research, AI/ML inference, and statistical analysis, among others. To help domain scientists achieve excellent performance on the system, software applications in these and related fields will be optimized for Ampere hardware and made readily available. This award reflects NSF’s statutory mission and that this initiative has been deemed worthy of support through evaluation using the foundation’s intellectual merit and broader-impacts review criteria.

The awarded funds are primarily for purchase of the supercomputer and first year activities, with additional funds to be provided for operations over five years, subject to external review.

# # #

About the U.S. National Science Foundation (NSF)

The U.S. National Science Foundation (NSF) is an independent federal agency that supports science and engineering in all 50 states and U.S. territories. NSF was established in 1950 by Congress to:

Promote the progress of science.
Advance the national health, prosperity and welfare.
Secure the national defense.

NSF fulfills its mission chiefly by making grants. NSF’s investments account for about 25% of federal support to America’s colleges and universities for basic research: research driven by curiosity and discovery. They also support solutions-oriented research with the potential to produce advancements for the American people.

About Stony Brook University

Stony Brook University is New York’s flagship university and No. 1 public university. It is part of the State University of New York (SUNY) system. With more than 26,000 students, more than 3,000 faculty members, more than 225,000 alumni, a premier academic healthcare system and 18 NCAA Division I athletic programs, Stony Brook is a research-intensive distinguished center of innovation dedicated to addressing the world’s biggest challenges. The university embraces its mission to provide comprehensive undergraduate, graduate and professional education of the highest quality, and is ranked as the #58 overall university and #26 among public universities in the nation by U.S. News & World Report’s Best Colleges listing. Fostering a commitment to academic research and intellectual endeavors, Stony Brook’s membership in the Association of American Universities (AAU) places it among the top 71 research institutions in North America. The university’s distinguished faculty have earned esteemed awards such as the Nobel Prize, Pulitzer Prize, Indianapolis Prize for animal conservation, Abel Prize, Fields Medal and Breakthrough Prizes in Mathematics and Physics. Stony Brook has the responsibility of co-managing Brookhaven National Laboratory for the U.S. Department of Energy — one of only eight universities with a role in running a national laboratory. In 2023, Stony Brook was named the anchor institution for The New York Climate Exchange on Governors Island in New York City. Providing economic growth for neighboring communities and the wider geographic region, the university totals an impressive $8.93 billion in increased economic output on Long Island. Follow us on Facebook https://www.facebook.com/stonybrooku/ and X @stonybrooku.

Source link