Connect with us

AI Insights

Experts worry about transparency, unforeseen risks as DOD forges ahead with new frontier AI projects

Published

on


Pentagon leadership recently tapped four major tech companies for separate contracts — each worth up to $200 million — to accelerate the Defense Department’s enterprise-wide adoption of some of the most advanced commercial algorithms and machine learning capabilities and deploy them against contemporary national security challenges.

Also called foundation models, frontier AI refers to sophisticated, constantly evolving, cutting-edge systems that are becoming increasingly intelligent at completing tasks like natural language processing, computer vision and reasoning. 

They’re rapidly pushing the boundaries of what existing AI can achieve. 

But while experts have warned about the unknowns associated with frontier AI development creating risks to humanity, the Defense Department’s Chief Digital and AI Office (CDAO) is not responding with full transparency regarding if and how those four powerful models were vetted to ensure they are safe for responsible operational use, before the high-dollar contracts were awarded.

In an emailed response to multiple questions DefenseScoop asked the CDAO regarding any early efforts conducted to demonstrate the models could be trusted for DOD-specific applications, a defense official said:

“The contract awards to frontier AI companies are designed to enable the department to leverage the technology and talent at these companies to develop agentic AI workflows across a variety of mission areas. DOD will tailor technical approaches based on mission need and industry capabilities. As part of the protoyping effort, DOD will assess both the opportunities and risks of frontier AI models in DOD use cases. The DOD is committed to ensuring that all deployments of AI technologies comply with applicable security policies and executive orders. In our prototyping and experimentation efforts with frontier AI companies, we are exercising risk management practices throughout the technology lifecycle to safeguard our people, our data, and our mission, while preventing unauthorized use.”

The official declined to provide more information beyond that statement in response to follow-up and clarification questions.

Not enough info

During his 36 years of service in the U.S. Air Force, now retired Lt. Gen. Jack Shanahan accumulated more than 2,800 flight hours. He went on to work in DOD’s Intelligence and Security directorate, before the Joint Artificial Intelligence Center (JAIC) was formed in 2018 and he was tapped to serve as its first chief. The JAIC was one of the original DOD organizations that were fused to form the CDAO in 2022.

“I love talking about test and evaluation. When I was at the JAIC, I would always make it abundantly clear — we’re not going to talk about anything sensitive, I’m not going to give the media access to exactly what we’re doing — but I owe it to people to talk about this. Why would we not?” Shanahan told DefenseScoop in a recent interview. 

The defense official’s responses, in his view, should have included a more straightforward acknowledgement that the CDAO is or will work closely with each of the four companies — OpenAI, Anthropic, Google and xAI — to understand their T&E and red-teaming processes used to prove and refine the models.

“Where I think you gave them a softball and they swung and missed at it was, rather than just saying anything at all about test and evaluation, they said ‘risk management practices.’ That’s a missed opportunity. I understand why somebody would use that phrase, but why not just say ‘Risk management is part of T&E … so we’re going to partner closely with these companies, and we’re going to get this thing right?’” Shanahan said. “It is as simple as that.”

Further, he would’ve liked to see more information regarding whether the vendors supplied DOD with their raw model weights, which essentially encapsulate all that the AI systems have learned in training and ultimately represent the core intelligence of each model.

“If they shared those raw model weights with the government, that’s a big deal, because then the government can do a lot more than just getting access to the model itself,” Shanahan said.

In a separate discussion with DefenseScoop about the CDAO’s recent foundation model awards, AI safety engineer Dr. Heidy Khlaaf pointed out that T&E and related risk assessments typically take significantly longer time than the timescales observed for the four contracts. 

Khlaaf currently serves as the chief AI scientist at the AI Now Institute, where she concentrates on the assessment and safety of AI within autonomous weapon systems.

“The DOD recently cutting the size of the Office of the Director of Operational Test and Evaluation in half speaks for itself. In a lot of ways, there is signalling for much faster AI adoption without the rigorous processes that have existed since the 1980s to ensure new technologies are safe or effective,” she said.

Pointing to publicly available information regarding the four commercial models and latest evaluation results, Khlaaf argued that they would likely not meet the standard defense thresholds expected for systems to be used in critical military-supporting settings.  

“We’ve particularly warned before that commercial models pose a much more significant safety and security threat than military purpose-built models, and instead this announcement has disregarded these known risks and boasts about commercial use as an accelerator for AI, which is indicative of how these systems have clearly not been appropriately assessed,” Khlaaf explained.


There are certain contracts, such as experimental use cases and research and development projects, that might not require T&E or risk assessments. However, Khlaaf noted, such checks would be exceedingly necessary in the CDAO’s current frontier AI efforts — as the announcement explicitly calls out the use of “AI capabilities to address critical national security challenges.”

“An independent assessment to substantiate these companies’ claims has always been an existing core requirement of military and safety-critical systems, and it guarantees that no aspect of the system’s pipeline is compromised, while ensuring a system’s security and fitness for use,” she said.

Existing, relevant risks that accompany discarding T&E practices, Khlaaf added, were already evident in a recent viral incident where Elon Musk-owned xAI’s model — Grok — praised Adolf Hitler, referred to itself as MechaHitler, and generated other antisemitic content. 

“This was due to an updated system prompt by the Grok team itself to nudge it towards a specific view. It dispels the myth that frontier AI is somehow objective or in control of its learning. A model can always be nudged and tampered by AI companies and even adversaries to output a specific view, which gives them far too much control over our military systems. And this is just one security issue out of dozens that have been unveiled over the last several years that have yet to be addressed,” Khlaaf told DefenseScoop.

‘Not risk-free’

Shanahan pointed out that the new AI Action Plan issued by President Donald Trump days after the CDAO announced the frontier AI partnerships “says explicitly, ‘evaluate frontier AI systems for national security risk.’”

Drawing from his personal experiences, the former fighter pilot said it is important to consider the different sets of risks between prompting commercial frontier AI capabilities on one’s home computer, versus applications inside the Pentagon.

“They’re going to be used potentially for intelligence analysis, for the development of operational plans and courses of action. And maybe only a subset of [these use cases] will be true, life or death [warfare-type applications], but there will be serious consequences if and when these models confabulate, get things wrong or spit garbage out the other end,” he said.

The Pentagon’s awards to the four companies will be made under indefinite-delivery, indefinite-quantity (IDIQ) contracts that will be paid out as a now undetermined amount of the services and technology — worth up to $800 million across the four companies — are purchased and delivered based on adapting demands during a fixed period of time. 

The defense official declined to directly answer DefenseScoop’s questions about whether the CDAO has made any awards under the four IDIQ deals, to date. 

Meanwhile, Shanahan expressed concerns that Pentagon officials may already be accessing the foundation models inside the building, even as it is against current policies.

“I promise you, people are using these in their lives in the Pentagon, even though they haven’t officially been allowed to do that. But that’s different than saying, ‘OK, I’m going to develop an intelligence assessment and an intel analysis using one of these models, and I’m going to give that forward information.’ Well, what if the data [that trained the model] was corrupted? What if China had access to that data and the model was spitting out something exactly the opposite of what it should have been? Who knows that?” he said.

Taken one way, the response to DefenseScoop’s questions might suggest that officials plan to assess the risks of each model based on each experimental use case they run, he noted.  

“Play around with it. That’s great — I’m all for it. But if you’re going to use these operationally, you’ve got to have a level of confidence [that they are safe]. And how do you get that? Well, it’s a combination of the companies sharing with you their own internal testing and their benchmarks that they did, but also the government’s ability to do this themselves,” Shanahan said.

According to Khlaaf, not conducting T&E sets a precedent that paves the way for faster adoption without the due diligence that systems are ensured to have an accepted minimal baseline of safety and security guarantees. And on a technical level, without proper T&E, military data and operations can be threatened with experimental uses. 

“If, for example, critical military data is used to fine-tune these models, inherent vulnerabilities within LLMs can allow for the extraction of this critical data through observed model predictions alone by unapproved parties. Other attack vectors include poisoning web-scale training datasets and ‘sleeper agents’ within commercial foundation models that compromise their supply chain, which may only be triggered during specific instances to intentionally or inadvertently subvert models used within military applications and compromise their behavior,”
Khlaaf said. “So unfortunately experimental use is not risk-free, especially without preliminary T&E to ensure that such experimentation would in fact be risk-free.”

And once the models are deployed in the wild beyond prototyping, even those used for the most banal applications that are often associated with bureaucratic functions — like communications, coding, resolving IT tickets and data processing — can introduce threats. 

The speed DOD takes to deploy them holds potential for compromising the safety of civil and defense infrastructure, because administrative tasks can feed into mission-critical decisions. 

“But as repeated research [efforts] have shown, AI tools consistently fabricate outputs — known as hallucinations — and introduce novel vulnerabilities,” Khlaaf said. “Their use might lead to an accumulation of errors, and over time small errors will propagate to intelligence or decision-making, which could result in decisions that cause civilian harm and tactical mistakes.”

OpenAI, Google and xAI did not respond to DefenseScoop’s requests for more information about the new frontier AI partnerships with CDAO.

An official from Anthropic did not provide details about T&E conducted with or for DOD, but said that the company conducts “rigorous safety testing” — and that it was “one of the first AI labs to receive ISO/IEC 42001:2023 certification for responsible and safe AI,” which marks an early international standard for AI Management Systems. 

Anthropic’s “Claude models are the most resistant to prompt injection and jailbreak techniques (CalypsoAI model security leaderboards, Huggingface) and are the least likely to hallucinate models on the market (MASK leaderboard),” the official said in an email. They noted that Anthropic will continue to work with its commercial cloud partners to ensure that its models are available while meeting the DOD’s most stringent requirements for information-handling at the controlled unclassified level and above. 

“On model weights, we do not share our model weights,” the Anthropic official told DefenseScoop.


Written by Brandi Vincent

Brandi Vincent is DefenseScoop’s Pentagon correspondent. She reports on emerging and disruptive technologies, and associated policies, impacting the Defense Department and its personnel. Prior to joining Scoop News Group, Brandi produced a long-form documentary and worked as a journalist at Nextgov, Snapchat and NBC Network. She grew up in Louisiana and received a master’s degree in journalism from the University of Maryland.



Source link

AI Insights

AI drug companies are struggling—but don’t blame the AI

Published

on


Moonshot hopes of artificial intelligence being used to expedite the development of drugs are coming back down to earth. 

More than $18 billion has flooded into more than 200 biotechnology companies touting AI to expedite development, with 75 drugs or vaccines entering clinical trials, according to Boston Consulting Group. Now, investor confidence—and funding—is starting to waver.

In 2021, venture capital investment in AI drug companies reached an apex with more than 40 deals being made worth about $1.8 billion. This year, there have been fewer than 20 deals worth about half of that peak sum, the Financial Times reported, citing data from Pitchbook. 

Some existing companies have struggled in the face of challenges. In May, biotech company Recursion tabled three of its prospective drugs in a cost-cutting effort following a merger with Exscientia, a similar biotech firm, last year. Fortune previously reported that none of Recursion’s discovered AI-compounds have reached the market as approved drugs. After a major restructuring in December 2024, biotech company BenevolentAI delisted from the Euronext Amsterdam stock exchange in March before merging with Osaka Holdings. 

A Recursion spokesperson told Fortune the decision to shelve the drugs was “data-driven” and a planned outcome of its merger with Exscientia.

“Our industry’s 90% failure rate is not acceptable when patients are waiting, and we believe approaches like ours that integrate cutting-edge tools and technologies will be best positioned for long-term success,” the spokesperson said in a statement.

BenevolentAI did not respond to a request for comment.

The struggles of the industry coincide with a broader conversation around the failure of generative AI to deliver more quickly on its lofty promises of productivity and efficiency. An MIT report last month found 95% of generative AI pilots at companies failed to accelerate revenue. A U.S. Census Bureau survey this month found AI adoption in large U.S. companies has declined from its 14% peak earlier this year to 12% as of August.

But the AI technology used to help develop drugs is far different than those from large language models used in most workplace initiatives and should therefore not be held to the same standards, according to Scott Schoenhaus, managing director and equity research analyst for KeyBanc Capital Markets Inc. Instead, the industry faces its own set of challenges.

“No matter how much data you have, human biology is still a mystery,” Schoenhaus told Fortune.

Macro and political factors drying up AI drug development funding

At the crux of the slowed funding and slower development results may not be the limitations of the technology itself, but rather a slew of broader factors, Schoenhaus said.

“Everyone acknowledges the funding environment has dried up,” he said. “The biotech market is heavily influenced by low interest rates. Lower interest rates equals more funding coming into biotechs, which is why we’re seeing funding for biotech at record lows over the last several years, because interest rates have remained elevated.”

It wasn’t always this way. Leveraging AI in drug development is not only thanks to growing access to semiconductor chips, but also how technology has allowed for quick and now cheap ways of mapping the entire human genome. In 2001, it cost more than $100 million to map the human genome. Two decades later, that undertaking cost about $1,000.

Beyond having the pandemic to thank for next-to-nothing interest rates in 2021, COVID also expedited partnerships between AI drug development start ups and Big Pharma companies. In early 2022 biotechnology startup AbCellera and Eli Lilly got emergency FDA approval for an antibody used in the early COVID vaccines, a tangible example of how the tech could be used to aid in drug discoveries.

But since then, there have been other industry hurdles, Schoenhaus said, including Big Pharma cutting back on research and development costs amid slowing demand, as well as uncertainty surrounding whether President Donald Trump would impose a tariff on pharmaceuticals as the U.S. and European Union tussled over a trade deal. Trump signed a memo this week threatening to ban direct-to-consumer advertising for prescription medications, theoretically driving down pharma revenues.

Limitations of AI

That’s not to say there haven’t been technological hiccups in the industry.

“There is scrutiny around the technology themselves,” Schoenhaus said. “Everyone’s waiting for these readouts to prove that.”

The next 12 months of emerging data from AI drug development startups will be critical in determining how successful these companies stand to be, Schoenhaus said. Some of the results so far have been mixed. For example, Recursion released data from a mid-stage clinical trial of a drug to treat a neurovascular condition in September last year, finding the drug was safe but that there was little evidence of how effective it was. Company shares fell double digits following the announcement. 

These companies are also limited by how they’re able to leverage AI. The drug development process is one that takes 10 years and is intentionally bottlenecked to ensure the safety and efficacy of the drugs in question, according to according to David Siderovski, chair of University of North Texas Health Science Center’s Department of Pharmacology & Neuroscience, who has previously worked with AI drug development companies in the private sector. Biotechnology companies using AI to make these processes more efficient are usually only tackling one small part of this bottleneck, such as being able to screen and identify a drug-like molecule faster than previously.

“There are so many stages that have to be jumped over before you can actually declare the [European Medicines Agency], or the FDA, or Health Canada, whoever it is, will designate this as a safe, approved drug to be marketed to patients out in the world,” Siderovski told Fortune. “That one early bottleneck of auditioning compounds is not the be-all and end-all of satisfying shareholders by announcing, ‘We have approval for this compound as a drug.’”

Smaller companies in the sector have also made a concerted effort to partner less with Big Pharma companies, preferring instead to build their own pipelines, even if it means no longer having access to the franchise resources of industry giants. 

“They want to be able to pursue their technology and show the validation of their platform sooner than later,” Schoenhaus said. “They’re not going to wait around for large pharma to pursue a partnered molecule. They’d rather just do it themselves and say, ‘Hey, look, our technology platform works.’”

Schoenhaus sees this strategy as a way for companies looking to prove themselves by perfecting the use of AI to better understand the slippery, mysterious, and still greatly unknown frontier of human biology.

“It’s just a very much more complex application of AI,” he said, “hence why I think we are still seeing these companies focus on their own internal pipelines so that they can really, squarely focus their resources on trying to better their technology.”



Source link

Continue Reading

AI Insights

Companies Rehire Human Workers to Fix Artificial Intelligence Generated Content After Mass Layoffs

Published

on


IN A NUTSHELL
  • 🤖 Companies increasingly use AI to replace human workers, highlighting the trend of automation.
  • 🔄 Many businesses find that AI outputs lack quality, leading to a return to human expertise.
  • 👥 Freelancers like Lisa Carstens and Harsh Kumar are rehired to fix AI-generated content.
  • 💼 The evolving landscape poses questions about fair compensation for human improvements to AI work.

The integration of artificial intelligence (AI) into workplaces has become a prevalent trend, often at the expense of human employees. This shift, while aiming to optimize efficiency and cut costs, has exposed the limitations of relying solely on AI. As companies increasingly replace human roles with AI, they encounter unforeseen challenges that highlight the irreplaceable value of human expertise. The journey reveals the complex dynamics between technology adoption and workforce sustainability, raising important questions about the future of work and the role of AI in it.

AI’s Shortcomings Lead to Reemployment

While AI promises to revolutionize industries by automating tasks, its execution often falls short, leading companies to reconsider their human workforce. AI-generated outputs frequently lack the nuance and precision that human creativity and expertise bring. For instance, textual content may appear repetitive, designs might lack clarity, and AI-generated code could result in unstable applications. These deficiencies compel businesses to turn back to the very employees they had previously let go.

Lisa Carstens, an independent illustrator and designer, experienced firsthand the limitations of AI. Based in Spain, Carstens found herself rehired to fix AI-generated visuals that were, at best, superficially appealing and, at worst, unusable. She noted that many companies assumed AI could operate without human intervention, only to realize the opposite.

“ChatGPT Crushes All Competition”: OpenAI’s Bot Gets 46.6 Billion Visits While Controlling 83% of Global AI Traffic Right Now

“There are people who understand AI’s imperfections and those who become frustrated when it doesn’t perform as expected,” Carstens explains, highlighting the delicate balance freelancers must maintain when rectifying AI’s mistakes.

The Emergence of a New Freelance Economy

AI has inadvertently given rise to a new type of freelance work focused on improving AI-generated content. Developers like Harsh Kumar, based in India, have seen a resurgence in demand for their skills as AI’s limitations become apparent. Clients who invested heavily in AI coding tools often found the results to be unsatisfactory, leading them to seek human expertise to salvage projects.

“We Built A Walking Robot From Just 18 Metal Parts”: Tokyo Engineers Create Open-Source Bipedal Robot That Anyone Can Assemble At Home

Kumar echoes the sentiment that AI can enhance productivity but cannot entirely replace human input. “Humans will remain essential for long-term projects,” he asserts, emphasizing that AI, created by humans, still requires human oversight. While work is plentiful, the nature of assignments has evolved, with a focus on refining and iterating upon AI’s initial attempts at content creation.

The Challenges of Human-AI Collaboration

The dynamic between AI and human workers is not without its challenges. While companies that over-relied on AI often seek to rehire their former employees, they also attempt to reduce compensation for these roles. The justification is that the work now involves refining existing AI-generated content rather than creating it from scratch.

“Robotic Arms Move Like Dancers”: AI System Choreographs Factory Robots That Solve 40 Tasks In Seconds While Traditional Programming Takes Hours Of Human Work

This shift highlights a more integrated human-machine collaboration where both entities contribute uniquely to the final product. However, it also raises questions about fair compensation and the value of human expertise in a world increasingly influenced by AI. As companies attempt to balance cost-cutting with quality assurance, the debate over appropriate remuneration for freelance revisions of AI work continues.

AI in the Workplace: A Double-Edged Sword

While AI offers numerous advantages, such as increased efficiency and cost savings, it also presents significant challenges. Businesses must navigate the delicate balance between adopting AI technologies and maintaining a skilled human workforce. The experiences of freelancers like Carstens and Kumar underline the necessity of human oversight in ensuring AI-generated content meets industry standards.

As AI continues to evolve, companies must critically assess its role in their operations. The initial allure of AI-driven cost reductions must be weighed against the potential for subpar results and the subsequent need for human intervention. This ongoing evaluation highlights the importance of strategic planning in technology adoption, ensuring that businesses maximize AI’s benefits without compromising quality.

As AI becomes further entrenched in workplaces, companies must decide how best to leverage technology while valuing human contributions. The need for skilled professionals to enhance AI outputs underscores the irreplaceable nature of human expertise. Will businesses find a sustainable model that harmonizes technological advancements with human creativity and skill, or will the pendulum swing back toward a more human-centric approach?

This article is based on verified sources and supported by editorial technologies.

Did you like it? 4.5/5 (24)



Source link

Continue Reading

AI Insights

Artificial Intelligence goes for a test run at the Sheriff’s Office | Top Stories

Published

on

























Artificial Intelligence goes for a test run at the Sheriff’s Office | Top Stories | wrex.com

We recognize you are attempting to access this website from a country belonging to the European Economic Area (EEA) including the EU which
enforces the General Data Protection Regulation (GDPR) and therefore access cannot be granted at this time.

For any issues, contact wrex@wrex.com or call 815-335-2213.



Source link

Continue Reading

Trending