AI Research

Top AI companies have spent months working with US, UK governments on model safety

Published

1 day ago

September 15, 2025

Both OpenAI and Anthropic said earlier this month they are working with the U.S. and U.K. governments to bolster the safety and security of their commercial large language models in order to make them harder to abuse or misuse.

In a pair of blogs posted to their websites Friday, the companies said for the past year or so they have been working with researchers at the National Institute of Standards and Technology’s U.S. Center for AI Standards for Innovation and the U.K. AI Security Institute.

That collaboration included granting government researchers access to the companies’ models, classifiers, and training data. Its purpose has been to enable independent experts to assess how resilient the models are to outside attacks from malicious hackers, as well as their effectiveness in blocking legitimate users from leveraging the technology for legally or ethically questionable purposes.

OpenAI’s blog details the work with the institutes, which studied the capabilities of ChatGPT in cyber, chemical-biological and “other national security relevant domains.”That partnership has since been expanded to newer products, including red-teaming the company’s AI agents and exploring new ways for OpenAI “to partner with external evaluators to find and fix security vulnerabilities.”

OpenAI already works with selected red-teamers who scour their products for vulnerabilities, so the announcement suggests the company may be exploring a separate red-teaming process for its AI agents.

According to OpenAI, the engagement with NIST yielded insights around two novel vulnerabilities affecting their systems. Those vulnerabilities “could have allowed a sophisticated attacker to bypass our security protections, and to remotely control the computer systems the agent could access for that session and successfully impersonate the user for other websites they’d logged into,” the company said.

Initially, engineers at OpenAI believed the vulnerabilities were unexploitable and “useless” due to existing security safeguards. But researchers identified a way to combine the vulnerabilities with a known AI hijacking technique — which corrupts the underlying context data the agent relies on to guide its behavior — that allowed them to take over another user’s agent with a 50% success rate.

Between May and August, OpenAI worked with researchers at the U.K. AI Security Institute to test and improve safeguards in GPT5 and ChatGPT Agent. The engagement focused on red-teaming the models to prevent biological misuse — preventing the model from providing step-by-step instructions for making bombs, chemical or biological weapons.

The company said it provided the British government with non-public prototypes of its safeguard systems, test models stripped of any guardrails, internal policy guidance on its safety work, access to internal safety monitoring models and other bespoke tooling.

Anthropic also said it gave U.S. and U.K. government researchers access to its Claude AI systems for ongoing testing and research at different stages of development, as well as its classifier system for finding jailbreak vulnerabilities.

That work identified several prompt injection attacks that bypassed safety protections within Claude — again by poisoning the context the model relies on with hidden, malicious prompts — as well as a new universal jailbreak method capable of evading standard detection tools. The jailbreak vulnerability was so severe that Anthropic opted to restructure its entire safeguard architecture rather than attempt to patch it.

Anthropic said the collaboration taught the company that giving government red-teamers deeper access to their systems could lead to more sophisticated vulnerability discovery.

“Governments bring unique capabilities to this work, particularly deep expertise in national security areas like cybersecurity, intelligence analysis, and threat modeling that enables them to evaluate specific attack vectors and defense mechanisms when paired with their machine learning expertise,” Anthropic’s blog stated.

OpenAI and Anthropic’s work with the U.S. and U.K. comes as some AI safety and security experts have questioned whether those governments and AI companies may be deprioritizing technical safety guardrails as policymakers seek to give their domestic industries maximal freedom to compete with China and other competitors for global market dominance.

After coming into office, U.S. Vice President JD Vance downplayed the importance of AI safety at international summits, while British Labour Party Prime Minister Keir Starmer reportedly walked back a promise in the party’s election manifesto to enforce safety regulations on AI companies following Donald Trump’s election. A more symbolic example: both the U.S. and U.K. government AI institutes changed their names this earlier year to remove the word “safety.”

But the collaborations indicate that some of that work remains ongoing, and not every security researcher agrees that the models are necessarily getting worse.

Md Raz, a Ph.D student at New York University who is part of a team of researchers that study cybersecurity and AI systems, told CyberScoop that in his experience commercial models are getting harder, not easier, to jailbreak with each new release.

“Definitely over the past few years I think between GPT4 and GPT 5 … I saw a lot more guardrails in GPT5, where GPT5 will put the pieces together before it replies and sometimes it will say, ‘no, I’m not going to do that.’”

Other AI tools, like coding models “are a lot less thoughtful about the bigger picture” of what they’re being asked to do and whether it’s malicious or not, he added, while open-source models are “most likely to do what you say” and existing guardrails can be more easily circumvented.

Source link

Related Topics:AI safety Anthropic Artificial Intelligence (AI)chatgpt Claude NIST OpenAI red team uk

Up Next

Arista touts liquid cooling, optical tech to reduce power consumption for AI networking

Don't Miss

Learning by Doing: AI, Knowledge Transfer, and the Future of Skills | American Enterprise Institute

djohnson

Click to comment

AI Research

(Policy Address 2025) HK earmarks HK$3B for AI research and talent recruitment – The Standard (HK)

Published

1 hour ago

September 17, 2025

The Standard 英文虎報

(Policy Address 2025) HK earmarks HK$3B for AI research and talent recruitment The Standard (HK)

Source link

AI Research

[2506.08171] Worst-Case Symbolic Constraints Analysis and Generalisation with Large Language Models

Published

2 hours ago

September 17, 2025

Daniel Koh, Yannic Noller, Corina S. Pasareanu, Adrians Skapars, Youcheng Sun

[Submitted on 9 Jun 2025 (v1), last revised 16 Sep 2025 (this version, v2)]

View a PDF of the paper titled Worst-Case Symbolic Constraints Analysis and Generalisation with Large Language Models, by Daniel Koh and 4 other authors

View PDF
HTML (experimental)

Abstract:Large language models (LLMs) have demonstrated strong performance on coding tasks such as generation, completion and repair, but their ability to handle complex symbolic reasoning over code still remains underexplored. We introduce the task of worst-case symbolic constraints analysis, which requires inferring the symbolic constraints that characterise worst-case program executions; these constraints can be solved to obtain inputs that expose performance bottlenecks or denial-of-service vulnerabilities in software systems. We show that even state-of-the-art LLMs (e.g., GPT-5) struggle when applied directly on this task. To address this challenge, we propose WARP, an innovative neurosymbolic approach that computes worst-case constraints on smaller concrete input sizes using existing program analysis tools, and then leverages LLMs to generalise these constraints to larger input sizes. Concretely, WARP comprises: (1) an incremental strategy for LLM-based worst-case reasoning, (2) a solver-aligned neurosymbolic framework that integrates reinforcement learning with SMT (Satisfiability Modulo Theories) solving, and (3) a curated dataset of symbolic constraints. Experimental results show that WARP consistently improves performance on worst-case constraint reasoning. Leveraging the curated constraint dataset, we use reinforcement learning to fine-tune a model, WARP-1.0-3B, which significantly outperforms size-matched and even larger baselines. These results demonstrate that incremental constraint reasoning enhances LLMs’ ability to handle symbolic reasoning and highlight the potential for deeper integration between neural learning and formal methods in rigorous program analysis.

Submission history

From: Daniel Koh [view email]
[v1]
Mon, 9 Jun 2025 19:33:30 UTC (1,462 KB)
[v2]
Tue, 16 Sep 2025 10:35:33 UTC (1,871 KB)

Source link

AI Research

‘AI Learning Day’ spotlights smart campus and ecosystem co-creation

Published

3 hours ago

September 17, 2025

The Editors

When artificial intelligence (AI) can help you retrieve literature, support your research, and even act as a “super assistant”, university education is undergoing a profound transformation.

On 9 September, XJTLU’s Centre for Knowledge and Information (CKI) hosted its third AI Learning Day, themed “AI-Empowered, Ecosystem-Co-created”. The event showcased the latest milestones of the University’s “Education + AI” strategy and offered in-depth discussions on the role of AI in higher education.

In her opening remarks, Professor Qiuling Chao, Vice President of XJTLU, said: “AI offers us an opportunity to rethink education, helping us create a learning environment that is fairer, more efficient and more personalised. I hope today’s event will inspire everyone to explore how AI technologies can be applied in your own practice.”

Professor Qiuling Chao

In his keynote speech, Professor Youmin Xi, Executive President of XJTLU, elaborated on the University’s vision for future universities. He stressed that future universities would evolve into human-AI symbiotic ecosystems, where learning would be centred on project-based co-creation and human-AI collaboration. The role of educators, he noted, would shift from transmitters of knowledge to mentors for both learning and life.

Professor Youmin Xi

At the event, Professor Xi’s digital twin, created by the XJTLU Virtual Engineering Centre in collaboration with the team led by Qilei Sun from the Academy of Artificial Intelligence, delivered Teachers’ Day greetings to all staff.

(Teachers’ Day message from President Xi’s digital twin)

“Education + AI” in diverse scenarios

This event also highlighted four case studies from different areas of the University. Dr Ling Xia from the Global Cultures and Languages Hub suggested that in the AI era, curricula should undergo de-skilling (assigning repetitive tasks to AI), re-skilling, and up-skilling, thereby enabling students to focus on in-depth learning in critical thinking and research methodologies.

Dr Xiangyun Lu from International Business School Suzhou (IBSS) demonstrated how AI teaching assistants and the University’s Junmou AI platform can offer students a customised and highly interactive learning experience, particularly for those facing challenges such as information overload and language barriers.

Dr Juan Li from the School of Science shared the concept of the “AI amplifier” for research. She explained that the “double amplifier” effect works in two stages: AI first amplifies students’ efficiency by automating tasks like literature searches and coding. These empowered students then become the second amplifier, freeing mentors from routine work so they can focus on high-level strategy. This human-AI partnership allows a small research team to achieve the output of a much larger one.

Jing Wang, Deputy Director of the XJTLU Learning Mall, showed how AI agents are already being used to support scheduling, meeting bookings, news updates and other administrative and learning tasks. She also announced that from this semester, all students would have access to the XIPU AI Agent platform.

Students and teachers are having a discussion at one of the booths

AI education system co-created by staff and students

The event’s AI interactive zone also drew significant attention from students and staff. From the Junmou AI platform to the E

-Support chatbot, and from AI-assisted creative design to 3D printing, 10 exhibition booths demonstrated the integration of AI across campus life.

These innovative applications sparked lively discussions and thoughtful reflections among participants. In an interview, Thomas Durham from IBSS noted that, although he had rarely used AI before, the event was highly inspiring and motivated him to explore its use in both professional and personal life. He also shared his perspective on AI’s role in learning, stating: “My expectation for the future of AI in education is that it should help students think critically. My worry is that AI’s convenience and efficiency might make students’ understanding too superficial, since AI does much of the hard work for them. Hopefully, critical thinking will still be preserved.”

Year One student Zifei Xu was particularly inspired by the interdisciplinary collaboration on display at the event, remarking that it offered her a glimpse of a more holistic and future-focused education.

Dr Xin Bi, XJTLU’s Chief Officer of Data and Director of the CKI, noted that, supported by robust digital infrastructure such as the Junmou AI platform, more than 26,000 students and 2,400 staff are already using the University’s AI platforms. XJTLU’s digital transformation is advancing from informatisation and digitisation towards intelligentisation, with AI expected to empower teaching, research and administration, and to help staff and students leap from knowledge to wisdom.

Dr Xin Bi

“Looking ahead, we will continue to advance the deep integration of AI in education, research, administration and services, building a data-driven intelligent operations centre and fostering a sustainable AI learning ecosystem,” said Dr Xin Bi.

By Qinru Liu

Edited by Patricia Pieterse

Translated by Xiangyin Han

Source link