AI Research

Safeguarding Third-Party AI Research | Stanford HAI

Published

5 months ago

February 13, 2025

The Editors

Key Takeaways

Third-party AI research is essential to ensure that AI companies do not grade their own homework, but few companies actively protect or promote such research.
We found no major foundation model developers currently offer comprehensive protections for third-party evaluation. Instead, their policies often disincentivize it.
A safe harbor for good-faith research should be a top priority for policymakers. It enables good-faith research and increases the scale, diversity, and independence of evaluations.

Executive Summary

Third-party evaluation is a cornerstone of efforts to reduce the substantial risks posed by AI systems. AI is a vast field with thousands of highly specialized experts around the world who can help stress-test the most powerful systems. But few companies empower these researchers to test their AI systems, for fear of exposing flaws in their products. AI companies often block safety research with restrictive terms of service or by suspending researchers who report flaws.

In our paper, “A Safe Harbor for AI Evaluation and Red Teaming,” we assess the policies and practices of seven top developers of generative AI systems, finding that none offers comprehensive protections for third-party AI research. Unlike with cybersecurity, generative AI is a new field without well-established norms regarding flaw disclosure, safety standards, or mechanisms for conducting third-party research. We propose that developers adopt safe harbors to enable good-faith, adversarial testing of AI systems.

Introduction

Generative AI systems pose a wide range of potential risks, from enabling the creation of nonconsensual intimate imagery to facilitating the development of malware. Evaluating generative AI systems is crucial to understanding the technology, ensuring public accountability, and reducing these risks.

In July 2023, many prominent AI companies signed voluntary commitments at the White House, pledging to “incent third-party discovery and reporting of issues and vulnerabilities.” More than a year later, implementation of this commitment has been uneven. While some companies do reward researchers for finding security flaws in their AI systems, few companies strongly encourage research on safety or provide concrete protections for good-faith research practices. Instead, leading generative AI companies’ terms of service legally prohibit third-party safety and trustworthiness research, in effect threatening anyone who conducts such research with bans from their platforms or even legal action. For example, companies’ policies do not allow researchers to jailbreak AI systems like ChatGPT, Claude, or Gemini to assess potential threats to U.S. national security.

In March 2024, we penned an open letter signed by over 350 leading AI researchers and advocates calling for a safe harbor for third-party AI evaluation. The researchers noted that while security research on traditional software is protected by voluntary company protections (safe harbors), established vulnerability disclosure norms, and legal safeguards from the Department of Justice, AI safety and trustworthiness research lacks comparable protections.

Companies have continued to be opaque about key aspects of their most powerful AI systems, such as the data used to build their models. Developers of generative AI models tout the safety of their systems based on internal red teaming, but there is no way for the government or independent researchers to validate these results, as companies do not release reproducible evaluations.

Generative AI companies also impose barriers on their platforms that limit good-faith research. Similar issues plague social media: Companies have taken steps to prevent researchers and journalists from conducting investigations on their platforms that, together with federal legislation, have had a chilling effect on such research and worsened the spread of harmful content online. But conducting research on generative AI systems comes with additional challenges, as the content on generative AI platforms is not publicly available. Users need accounts to access AI-generated content, which can be restricted by the company that owns the platform. Many AI companies also block certain user requests and limit the functionality of their models to prevent researchers from unearthing issues related to safety or trustworthiness. The stakes are also higher for AI, which has the potential not only to turbocharge misinformation but also to provide U.S. adversaries like China and Russia with material strategic advantages.

To assess the state of independent evaluation for generative AI, our team of machine learning, law, and policy experts conducted a thorough review of seven major AI companies’ policies, access provisions, and related enforcement processes. We detail our experiences with evaluation of AI systems and potential barriers other third-party evaluators may face, and propose alternative practices and policies to enable broader community participation in AI evaluation.

Source link

AI Research

Radiomics-Based Artificial Intelligence and Machine Learning Approach for the Diagnosis and Prognosis of Idiopathic Pulmonary Fibrosis: A Systematic Review – Cureus

Published

9 minutes ago

July 7, 2025

The Editors

Radiomics-Based Artificial Intelligence and Machine Learning Approach for the Diagnosis and Prognosis of Idiopathic Pulmonary Fibrosis: A Systematic Review Cureus

Source link

AI Research

Agentic AI Accelerates Shift From ‘Sick’ Care

Published

2 hours ago

July 7, 2025

PYMNTS

Healthcare is a complex and fragmented sector that has long been weighed down by legacy systems and regulations.

If that sounds like a recipe for innovation, you might want to get your ears checked.

The industry’s longstanding institutional inertia when it comes to modernizing not just the business of care but the administrative workflows and processes supporting it might be beginning to thaw.

The reason? The evolution of agentic artificial intelligence, which represents the latest, autonomous iteration of the buzzy software technology.

“We are in a unique time in history,” Autonomize AI CEO Ganesh Padmanabhan said during a discussion hosted by PYMNTS CEO Karen Webster. “Until large language models specifically came about, it was impossible to distill information out of complex medical clinical documentation and contextualize it for different workflows. Now it’s possible,”

Still, Webster noted, agentic AI has become the latest talking point regardless of its real-world results in critical areas.

“It used to be generative AI, now it’s agentic AI,” she said. “But this is still an emerging technology. Why is now the time for it to be applied in healthcare, given that a lot of the industry is still trying to get its arms around basic automation?”

“Healthcare is one of those industries with a lot of knowledge work,” Padmanabhan said. “Data is often created by humans for other humans to consume, which makes automation innately harder.”

At the heart of the problem in healthcare is an industry drowning in administrative burdens. In the United States, an estimated $1.5 trillion is spent on healthcare administration annually, a cost that contributes to delayed care, clinician burnout and poor patient experience.

Targeting the ‘Business of Care’ With Agentic AI

Rather than tackling every facet of healthcare at once, Autonomize AI, which closed a $28 million funding round last month, focuses on what Padmanabhan called the “business of care.” That includes the invisible scaffolding that supports how care is delivered, such as insurance approvals, quality reporting and patient communication.

“Our focus is on building AI assistants, copilots and agents to augment the workforce,” Padmanabhan said. “There are two people often forgotten in healthcare: the providers who deliver care, and the patients who receive it. We’re putting them both back at the center.”

One example is prior authorization, a complex and manual process in which doctors seek insurer approval for medical procedures. It often involves faxes, weeks-long delays, and endless reviews by nurses and doctors, ultimately leaving patients in limbo.

“This whole process takes days, if not weeks,” Padmanabhan said. “It’s very error-prone. We aim to automate the intake, parse the information in the medical records, adjudicate that against policies, and summarize it for a clinician to make a decision in minutes.”

As Webster noted of the pain point: “After a doctor has said, ‘I want you to see XYZ doctor,’ you assume that call is going to happen. And then it doesn’t. You have to chase it down. That burden falls back on the patient.”

Building Trust in a High-Stakes Environment

For healthcare businesses, unburdening clinicians from administrative tasks isn’t just about productivity but can be about purpose, too.

“There’s a 300,000-nurse shortage in the provider spectrum,” Padmanabhan said. “Most are working at health plans doing paperwork. We need to enable a transition for them to do what they’re meant to do, which is provide care at the point of care.”

Yet automating workflows in healthcare isn’t as easy as flipping a switch.

“This is a hard problem,” Padmanabhan said. “Healthcare data isn’t fully digitized. There are gaps in knowledge.”

Autonomize AI’s own solution is to deploy “copilots” that identify which parts of a workflow can be automated, and then orchestrate seamless handoffs between AI and human workers, he said. Over time, these systems learn and improve based on real-world use.

Trust is the linchpin.

Webster pointed out the risks of incorrect output.

“In a clinical setting, the ramifications of wrong can be quite significant,” she said. “How do you build in those checks and balances?”

“You’ve got to build trust through product,” Padmanabhan said. “Showing evidence, provenance and allowing clinicians to go back to the source data is crucial.”

The long-term vision of agentic AI in healthcare isn’t just about optimizing current processes; it’s about redefining success.

“We don’t do healthcare in this country. We do sick care,” Padmanabhan said. “We need to shift from measuring mortality rates to tracking how many preventative interventions reduced chronic disease.”

For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.

Source link

AI Research

Microsoft announces public preview of deep research in Azure AI foundry

Published

2 hours ago

July 7, 2025

Synopsis

This tool aims to accelerate research and deployment of frontier AI technologies, further strengthening Microsoft’s position in enterprise-grade generative AI solutions.

Agencies

Microsoft has launched the public preview of Deep Research within its Azure AI Foundry, enhancing capabilities for developers and enterprises to build advanced AI models.

Elevate your knowledge and leadership skills at a cost cheaper than your daily tea.

Subscribe Now

Source link