AI Research

Bad code, malicious models and rogue agents: Cybersecurity researchers scramble to prevent AI exploits

Published

1 month ago

August 8, 2025

When it comes to dealing with artificial intelligence, the cybersecurity industry has officially moved into overdrive.

Vulnerabilities in coding tools, malicious injections into models used by some of the largest companies in the world and agents that move across critical infrastructure without security protection have created a whole new threat landscape seemingly overnight.

“Who’s feeling like they really understand what’s going on?” asked Jeff Moss, president of DEF CON Communications Inc. and founder of the Black Hat conference, during his opening keynote remarks on Wednesday. “Nobody. It’s because we have a lot of change occurring at the same time. We don’t fully know what AI will disrupt yet.”

Bad code vs. secure code

While the full scope of the change has yet to become apparent, this year’s Black Hat USA gathering in Las Vegas provided plenty of evidence that AI is fueling a whole new class of vulnerabilities. A starting point identified by security researchers is in the code itself, which is increasingly written by autonomous AI agents.

“These systems are mimics, they’re incredible mimics,” cognitive scientist and AI company founder Gary Marcus said during a panel discussion at the conference. “Lots of bad code is going to be written because these systems don’t understand secure code.”

One problem identified by the cybersecurity community is that shortcuts using AI coding tools are being developed without thinking through the security consequences. Researchers from Nvidia Corp. presented findings that an auto-run mode on the AI-powered code editor Cursor allowed agents to run command files on a user’s machine without explicit permission. When Nvidia presented this potential vulnerability to Anysphere Inc.’s Cursor in May, the vibe coding company responded by offering users an ability to disable the auto-run feature, according to Becca Lynch, offensive security researcher at Nvidia, who spoke at the conference on Wednesday.

Vulnerabilities in the interfaces that support AI, such as coding tools, represent a growing area of concern in the security world. Part of this issue can be found in the sheer number of application programming interface endpoints that are being generated to run AI. Companies with generative AI have at least five times more API endpoints, according to Chuck Herrin, field chief information security officer at F5 Inc.

Black Hat USA drew more than 22,000 security professionals to Las Vegas this week.

“We’re blowing up that attack surface because a world of AI is a world of APIs,” said Herrin, who spoke at Black Hat’s AI Summit on Tuesday. “There’s no securing AI without securing the interfaces that support it.”

Securing those interfaces may be more difficult than originally imagined. Running AI involves a reliance on vector databases, training frameworks and inference servers, such as those provided by Nvidia. The Nvidia Container Toolkit enables use of the chipmaker’s GPUs within Docker containers, including those hosting inference servers.

Security researchers from Wiz Inc. presented recent findings of a Nvidia Container Toolkit vulnerability that posed a major threat to managed AI cloud services. Wiz found that the vulnerability allowed attackers to potentially access or manipulate customer data and proprietary models within 37% of cloud environments. Nvidia issued an advisory in July and provided a fix in its latest update.

“Any provider of cloud services was vulnerable to our attack,” said Hillai Ben Sasson, senior security researcher at Wiz. “AI security is first and foremost infrastructure security.”

Lack of protection for AI models

The expanding use of AI is being driven by adoption of large language models, an area of particular interest to the security community. The sheer volume of model downloads has attracted attention, with Meta Platforms Inc. reporting that its open AI model family, Llama, reached 1 billion downloads in March.

Yet despite the popularity of LLMs, security controls for them have not kept pace. “The $300 billion we spend on information security does not protect AI models,” Malcolm Harkins, chief security and trust officer at HiddenLayer Inc., said in an interview with SiliconANGLE. “The models are exploitable because there is no mitigation against vulnerability.”

This threat of exploitation has cast a spotlight on popular repositories where models are stored and downloaded. At last year’s Black Hat gathering, researchers presented evidence they had breached three of the largest AI model repositories.

This has become an issue of greater concern as enterprises continue to implement AI agents, which rely on LLMs to perform key tasks. “The LLM that drives and controls your agents can potentially be controlled by attackers,” Nvidia’s Lynch said this week. “LLMs are uniquely vulnerable to adversarial manipulation.”

Though major repositories have responded to breach vulnerabilities identified and shared by security researchers, there has been little evidence that the model repository platforms are interested in vetting their inventories for malicious code. It’s not because the problem is a technological challenge, according to Chris Sestito, co-founder and CEO of HiddenLayer.

“I believe you need to embrace the technology that exists,” Sestito told SiliconANGLE. “I don’t think the lift is that big.”

Agents fail breach test

If model integrity fails to be protected, this will likely have repercussions for the future of AI agents as well. Agentic AI is booming, yet the lack of security controls around the autonomous software is also beginning to generate concern.

Last month, cybersecurity company Coalfire Inc. released a report which documented its success in hacking agentic AI applications. Using adversarial prompts and working with partner standards such as those from the National Institute of Standards and Technology or NIST, the company was able to demonstrate new risks in compromise and data leakage.

Apostal Vassilev of NIST, Jess Burn of Forrester, and Nathan Hamiel of Kudelski Security spoke at the Black Hat AI Summit.

“There was a success rate of 100%,” Apostol Vassilev, research team supervisor at NIST, said during the AI Summit. “Agents are touching the same cyber infrastructure that we’ve been trying to protect for decades. Make sure you are exposing this technology only to assets and data you are willing to live without.”

Despite the concerns around agentic AI vulnerability, the security industry is also looking to adopt agents to bolster protection. An example of this can be found at Simbian Inc. which provides fully autonomous AI security operations center agents using toolchains and memory graphs to ingest signals, synthesize insight and make decisions in real time for threat containment.

Implementing agents for security has been a challenging problem, as Simbian co-founder and CEO Ambuj Kumar readily admitted. He told SiliconANGLE that his motivation was a need to protect critical infrastructure and keep essential services such as medical care safe.

“The agents we are building are inside your organization,” Kumar said. “They know where the gold coins are and they secure them.”

Solving the identity problem

Another approach being taken within the cybersecurity industry to safeguard agents is to bake attestation into the autonomous software through certificate chains at the silicon level. Anjuna Security Inc. is pursuing this solution through an approach known as “confidential computing.” The concept is to process data through a Trusted Execution Environment, a secure area within the processor where code can be executed safely.

This is the path forward for agentic AI, according to Ayal Yogev, co-founder and CEO of Ajuna. His company now has three of the world’s top 10 banks in its customer set, joining five next-generation payments firms and the U.S. Navy as clients.

“It becomes an identity problem,” said Yogev, who spoke with SiliconANGLE in advance of the Black Hat gathering. “If an agent is doing something for me, I need to make sure they don’t have permissions beyond what the user has. Confidential computing is the future of computing.”

For the near term, the future of computing is heavily dependent on the AI juggernaut, and this dynamic is forcing the cybersecurity community to speed up the research process to identify vulnerabilities and pressure platform owners to fix them. During much of the Black Hat conference this week, numerous security practitioners noted that even though the technology may be spinning off new solutions almost daily, the security problems have been seen before.

This will involve a measure of discipline and control, a message that notable industry figures such as Chris Inglis, the country’s first National Cyber Director and former deputy director of the National Security Agency, has been reinforcing for at least the past two years. In a conversation with SiliconANGLE, the former U.S. Air Force officer and command pilot noted that today’s cars are nothing more than controllable computers on wheels.

“I do have the ability to tell that car what to do,” Inglis said. “We need to fly this airplane.”

Can the cybersecurity industry regain a measure of control as AI hurtles through the skies? As seen in the sessions and side conversations at Black Hat this week, the security community is trying hard, but there remains a nagging concern that AI itself may prove to be ultimately ungovernable.

During the AI Summit on Tuesday, F5’s Herrin was asked what the one thing was that should never be done in AI security. “Trust it,” Herrin replied.

Photos: Mark Albertson/SiliconANGLE

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Source link

Up Next

Computing Community Consortium Outlines Roadmap for Long-Term AI Research

Don't Miss

Without rules, AI risks ‘trust crisis’

Mark Albertson

Click to comment

AI Research

Artificial Intelligence at Bayer – Emerj Artificial Intelligence Research

Published

10 minutes ago

September 15, 2025

The Editors

Bayer is a global life sciences company operating across Pharmaceuticals, Consumer Health, and Crop Science. In fiscal 2024, the group reported €46.6 billion in sales and 94,081 employees, a scale that makes internal AI deployments consequential for workflow change and ROI.

The company invests heavily in research, with more than €6 billion allocated to R&D in 2024, and its leadership frames AI as an enabler for both sustainable agriculture and patient-centric medicine. Bayer’s own materials highlight AI’s role in planning and analyzing clinical trials as well as accelerating crop protection discovery pipelines.

This article examines two mature, internally used applications that convey the central role AI plays in Bayer’s core business goals:

Herbicide discovery in crop science: Applying AI to narrow down molecular candidates and identify new modes of action.
Clinical trial analytics in pharmaceuticals: Ingesting heterogeneous trial and device data to accelerate compliant analysis.

AI-Assisted Herbicide Discovery

Weed resistance is a mounting global challenge. Farmers in the US and Brazil are facing species resistant to multiple herbicide classes, driving up costs and threatening crop yields. Traditional herbicide discovery is slow — often 12 to 15 years from concept to market — and expensive, with high attrition during early screening.

Bayer’s Crop Science division has turned to AI to help shorten these timelines. Independent reporting notes Bayer’s pipeline includes Icafolin, its first new herbicide mode of action in decades, expected to launch in Brazil in 2028, with AI used upstream to accelerate the discovery of new modes of action.

Reuters reports that Bayer’s approach uses AI to match weed protein structures with candidate molecules, compressing the early discovery funnel by triaging millions of possibilities against pre-determined criteria. Bayer’s CropKey overview describes a profile-driven approach, where candidate molecules are designed to meet safety, efficacy, and environmental requirements from the start.

The company claims that CropKey has already identified more than 30 potential molecular targets and validated over 10 as entirely new modes of action. These figures, while promising, remain claims until independent verification.

For Bayer’s discovery scientists, AI-guided triage changes workflows by:

Reducing early-stage wet-lab cycles by focusing on higher-probability matches between proteins and molecules.
Integrating safety and environmental criteria into the digital screen, filtering out compounds unlikely to meet regulatory thresholds.
Advancing promising molecules sooner, enabling earlier testing and potentially compressing development timelines from 15 years to 10.

Coverage by both Reuters and the Wall Street Journal notes this strategy is expected to reduce attrition and accelerate discovery-to-commercialization timelines.

The CropKey program has been covered by multiple independent outlets, a signal of maturity beyond a single press release. Reuters reports Bayer’s assertion that AI has tripled the number of new modes of action identified in early research compared to a decade ago.

The upcoming Icafolin herbicide, expected for commercial release in 2028, demonstrates that CropKey outputs are making their way into the regulatory pipeline. The presence of both media scrutiny and near-term launch candidates suggests CropKey is among Bayer’s most advanced AI deployments.

Video explaining Bayer’s CropKey process in crop protection discovery. (Source: Bayer)

By focusing AI on high-ROI bottlenecks in research and development, Bayer demonstrates how machine learning can trim low-value screening cycles, advancing only the most promising candidates into experimental trials. At the same time, acceleration figures reported by the company should be treated as claims until they are corroborated across multiple seasons, geographies, and independent trials.

Clinical Trial Analytics Platform (ALYCE)

Pharmaceutical development increasingly relies on complex data streams: electronic health records (EHR), site-based case report forms, patient-reported outcomes, and telemetry from wearables in decentralized trials. Managing this data volume and variety strains traditional data warehouses and slows regulatory reporting.

Bayer developed ALYCE (Advanced Analytics Platform for the Clinical Data Environment) to handle this complexity. In a PHUSE conference presentation, Bayer engineers describe the platform as a way to ingest diverse data, ensure governance, and deliver analytics more quickly while maintaining compliance.

The presentation describes ALYCE’s architecture as using a layered “Bronze/Silver/Gold” data lake approach. An example trial payload included approximately 300,000 files (1.6 TB) for 80 patients, requiring timezone harmonization, device ID mapping, and error handling before data could be standardized to SDTM (Study Data Tabulation Model) formats. Automated pipelines provide lineage, quarantine checks, and notifications. These technical details were presented publicly to peers, reinforcing their credibility beyond internal marketing.

For statisticians and clinical programmers, ALYCE claims to:

Standardize ingestion across structured (CRFs), semi-structured (EHR extracts), and unstructured (device telemetry) sources.
Automate quality checks through pipelines that reduce manual intervention and free staff up to focus on analysis.
Enable earlier insights by preparing analysis-ready datasets faster, shortening the lag between data collection and review.

These objectives are consistent with Bayer’s broader statement that AI is being used to plan and analyze clinical trials safely and efficiently.

PHUSE is a respected industry forum where sponsors share methods with peers, and Bayer’s willingness to disclose technical details indicates ALYCE is in production. While Bayer has not released precise cycle-time savings, its emphasis on elastic storage, regulatory readiness, and speed suggests measurable efficiency gains.

Given the specificity of the presentation — real-world payloads, architecture diagrams, and validation processes — ALYCE appears to be a mature platform actively supporting Bayer’s clinical trial programs.

Screenshot from Bayer’s PHUSE presentation illustrating ALYCE’s automated ELTL pipeline.
(Source: PHUSE)

Bayer’s commitment to ALYCE reflects its broader effort to modernize and scale clinical development. By consolidating varied data streams into a single, automated environment, the company positions itself to shorten study timelines, reduce operational overhead, and accelerate the movement of promising therapies from discovery to patients. This infrastructure also prepares Bayer to expand AI-driven analytics across additional therapeutic areas, supporting long-term competitiveness in a highly regulated industry.

While Bayer has not published specific cycle-time reductions or quantified cost savings tied directly to ALYCE, the company’s willingness to present detailed payload volumes and pipeline architecture at PHUSE indicates that the platform is actively deployed and has undergone peer-level scrutiny. Based on those disclosures and parallels with other pharma AI implementations, reasonable expectations include faster data review cycles, earlier anomaly detection, and improved compliance readiness. These outcomes—though not yet publicly validated—suggest ALYCE is reshaping Bayer’s trial workflows in ways that could yield significant long-term returns.

Source link

AI Research

The STARD-AI reporting guideline for diagnostic accuracy studies using artificial intelligence

Published

15 minutes ago

September 15, 2025

Ahmad Guni

Reichlin, T. et al. Early diagnosis of myocardial infarction with sensitive cardiac troponin assays. N. Engl. J. Med. 361, 858–867 (2009).

Article
PubMed
CAS

Google Scholar

Hawkes, N. Cancer survival data emphasise importance of early diagnosis. BMJ 364, l408 (2019).

Article
PubMed

Google Scholar

Neal, R. D. et al. Is increased time to diagnosis and treatment in symptomatic cancer associated with poorer outcomes? Systematic review. Br. J. Cancer 112, S92–S107 (2015).

Article
PubMed
PubMed Central

Google Scholar

Leifer, B. P. Early diagnosis of Alzheimer’s disease: clinical and economic benefits. J. Am. Geriatr. Soc. 51, S281–S288 (2003).

Article
PubMed

Google Scholar

Crosby, D. et al. Early detection of cancer. Science 375, eaay9040 (2022).

Article
PubMed
CAS

Google Scholar

Fleming, K. A. et al. The Lancet Commission on diagnostics: transforming access to diagnostics. Lancet 398, 1997–2050 (2021).

Article
PubMed
PubMed Central

Google Scholar

Whiting, P. F., Rutjes, A. W., Westwood, M. E. & Mallett, S. A systematic review classifies sources of bias and variation in diagnostic test accuracy studies. J. Clin. Epidemiol. 66, 1093–1104 (2013).

Article
PubMed

Google Scholar

Glasziou, P. et al. Reducing waste from incomplete or unusable reports of biomedical research. Lancet 383, 267–276 (2014).

Article
PubMed

Google Scholar

Ioannidis, J. P. et al. Increasing value and reducing waste in research design, conduct, and analysis. Lancet 383, 166–175 (2014).

Article
PubMed
PubMed Central

Google Scholar

Lijmer, J. G. et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 282, 1061–1066 (1999).

Article
PubMed
CAS

Google Scholar

Irwig, L., Bossuyt, P., Glasziou, P., Gatsonis, C. & Lijmer, J. Designing studies to ensure that estimates of test accuracy are transferable. BMJ 324, 669–671 (2002).

Article
PubMed
PubMed Central

Google Scholar

Moons, K. G., van Es, G. A., Deckers, J. W., Habbema, J. D. & Grobbee, D. E. Limitations of sensitivity, specificity, likelihood ratio, and Bayes’ theorem in assessing diagnostic probabilities: a clinical example. Epidemiology 8, 12–17 (1997).

Article
PubMed
CAS

Google Scholar

Bossuyt, P. M. et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann. Intern. Med. 138, W1–W12 (2003).

Article
PubMed

Google Scholar

Bossuyt, P. M. et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 351, h5527 (2015).

Article
PubMed
PubMed Central

Google Scholar

Cohen, J. F. et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open 6, e012799 (2016).

Article
PubMed
PubMed Central

Google Scholar

Cohen, J. F. et al. STARD for Abstracts: essential items for reporting diagnostic accuracy studies in journal or conference abstracts. BMJ 358, j3751 (2017).

Article
PubMed

Google Scholar

Korevaar, D. A. et al. Reporting diagnostic accuracy studies: some improvements after 10 years of STARD. Radiology 274, 781–789 (2015).

Article
PubMed

Google Scholar

Korevaar, D. A., van Enst, W. A., Spijker, R., Bossuyt, P. M. & Hooft, L. Reporting quality of diagnostic accuracy studies: a systematic review and meta-analysis of investigations on adherence to STARD. Evid. Based Med. 19, 47–54 (2014).

Article
PubMed

Google Scholar

Miao, Z., Humphreys, B. D., McMahon, A. P. & Kim, J. Multi-omics integration in the age of million single-cell data. Nat. Rev. Nephrol. 17, 710–724 (2021).

Article
PubMed
PubMed Central

Google Scholar

Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

Article
PubMed
PubMed Central
CAS

Google Scholar

Williamson, E. J. et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature 584, 430–436 (2020).

Article
PubMed
PubMed Central
CAS

Google Scholar

Lu, R. et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet 395, 565–574 (2020).

Article
PubMed
PubMed Central
CAS

Google Scholar

De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).

Article
PubMed

Google Scholar

McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).

Article
PubMed
CAS

Google Scholar

Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

Article
PubMed
CAS

Google Scholar

Benjamens, S., Dhunnoo, P. & Meskó, B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit. Med. 3, 118 (2020).

Article
PubMed
PubMed Central

Google Scholar

Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit. Health 1, e271–e297 (2019).

Article
PubMed

Google Scholar

Liu, X. et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat. Med. 26, 1364–1374 (2020).

Article
PubMed
PubMed Central
CAS

Google Scholar

Rivera, S. C., Liu, X., Chan, A.-W., Denniston, A. K. & Calvert, M. J. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI Extension. BMJ 370, m3210 (2020).

Article
PubMed
PubMed Central

Google Scholar

Collins, G. S. et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 385, e078378 (2024).

Article
PubMed
PubMed Central

Google Scholar

Tejani, A. S. et al. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update. Radiol. Artif. Intell. 6, e240300 (2024).

Aggarwal, R. et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digit.Med. 4, 65 (2021).

Article
PubMed
PubMed Central

Google Scholar

McGenity, C. et al. Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy. NPJ Digit. Med. 7, 114 (2024).

Article
PubMed
PubMed Central

Google Scholar

Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1, 18 (2018).

Article
PubMed
PubMed Central

Google Scholar

Moons, K. G. M., de Groot, J. A. H., Linnet, K., Reitsma, J. B. & Bossuyt, P. M. M. Quantifying the added value of a diagnostic test or marker. Clin. Chem. 58, 1408–1417 (2012).

Article
PubMed

Google Scholar

Bossuyt, P. M. M., Reitsma, J. B., Linnet, K. & Moons, K. G. M. Beyond diagnostic accuracy: the clinical utility of diagnostic tests. Clin. Chem. 58, 1636–1643 (2012).

Article
PubMed
CAS

Google Scholar

Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60–69 (2025).

Article
PubMed
PubMed Central
CAS

Google Scholar

Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G. & King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17, 195 (2019).

Article
PubMed
PubMed Central

Google Scholar

Yang, Y., Zhang, H., Gichoya, J. W., Katabi, D. & Ghassemi, M. The limits of fair medical imaging AI in real-world generalization. Nat. Med. 30, 2838–2848 (2024).

The White House. Delivering on the Promise of AI to Improve Health Outcomes. https://bidenwhitehouse.archives.gov/briefing-room/blog/2023/12/14/delivering-on-the-promise-of-ai-to-improve-health-outcomes/ (2023).

Coalition for Health AI. Blueprint for Trustworthy AI Implementation Guidance and Assurance for Healthcare. https://www.chai.org/workgroup/responsible-ai/blueprint-for-trustworthy-ai (2023).

Guni, A., Varma, P., Zhang, J., Fehervari, M. & Ashrafian, H. Artificial intelligence in surgery: the future is now. Eur. Surg. Res. https://doi.org/10.1159/000536393 (2024).

Chen, R. J. et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat. Biomed. Eng. 7, 719–742 (2023).

Article
PubMed
PubMed Central

Google Scholar

Krakowski, I. et al. Human-AI interaction in skin cancer diagnosis: a systematic review and meta-analysis. NPJ Digit. Med. 7, 78 (2024).

Article
PubMed
PubMed Central

Google Scholar

Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).

Article
PubMed
CAS

Google Scholar

Tu, T. et al. Towards generalist biomedical AI. NEJM AI 1, AIoa2300138 (2024).

Article

Google Scholar

Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).

Article
PubMed
CAS

Google Scholar

Barata, C. et al. A reinforcement learning model for AI-based decision support in skin cancer. Nat. Med. 29, 1941–1946 (2023).

Article
PubMed
PubMed Central
CAS

Google Scholar

Mankowitz, D. J. et al. Faster sorting algorithms discovered using deep reinforcement learning. Nature 618, 257–263 (2023).

Article
PubMed
PubMed Central
CAS

Google Scholar

Corso, G., Stark, H., Jegelka, S., Jaakkola, T. & Barzilay, R. Graph neural networks. Nat. Rev. Methods Primers 4, 17 (2024).

Article
CAS

Google Scholar

Li, H. et al. CGMega: explainable graph neural network framework with attention mechanisms for cancer gene module dissection. Nat. Commun. 15, 5997 (2024).

Article
PubMed
PubMed Central
CAS

Google Scholar

Pahud de Mortanges, A. et al. Orchestrating explainable artificial intelligence for multimodal and longitudinal data in medical imaging. NPJ Digit. Med. 7, 195 (2024).

Article
PubMed
PubMed Central

Google Scholar

Johri, S. et al. An evaluation framework for clinical use of large language models in patient interaction tasks. Nat. Med. 31, 77–86 (2025).

Article
PubMed
CAS

Google Scholar

EQUATOR Network. Enhancing the QUAlity and Transparency Of health Research. https://www.equator-network.org/

Sounderajah, V. et al. Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: the STARD-AI Steering Group. Nat. Med. 26, 807–808 (2020).

Article
PubMed
CAS

Google Scholar

Sounderajah, V. et al. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol. BMJ Open 11, e047709 (2021).

Article
PubMed
PubMed Central

Google Scholar

Source link

AI Research

Tracking AI’s role in the US and global economy \ Anthropic

Published

1 hour ago

September 15, 2025

The Editors

Travel planning in Hawaii, scientific research in Massachusetts, and building web applications in India. On the face of it, these three activities share very little in common. But it turns out that they’re the particular uses of Claude that are some of the most overrepresented in each of these places.

That doesn’t mean these are the most popular tasks: software engineering is still by far in the lead in almost every state and country in the world. Instead, it means that people in Massachusetts have been more likely to ask Claude for help with scientific research than people elsewhere – or, for instance, that Claude users in Brazil appear to be particularly enthusiastic about languages: they use Claude for translation and language-learning about six times more than the global average.

These are statistics we found in our third Anthropic Economic Index report. In this latest installment, we’ve expanded our efforts to document the early patterns of AI adoption that are beginning to reshape work and the economy. We measure how Claude is being used differently…

…within the US: we provide the first-ever detailed assessment of how AI use differs between US states. We find that the composition of states’ economies informs which states use Claude the most per capita – and, surprisingly, that the very highest-use states aren’t the ones where coding dominates.
…across different countries: our new analysis finds that countries’ use of Claude is strongly correlated with income, and that people in lower-use countries use Claude to automate work more frequently than those in higher-use ones.
…over time: we compare our latest data with December 2024-January 2025 and February–March 2025. We find that the proportion of ‘directively’ automated tasks increased sharply from 27% to 39%, suggesting a rapid increase in AI’s responsibility (and in users’ trust).
…and by business users: we now include anonymized data from Anthropic’s first-party API customers (in addition to users of Claude.ai), allowing us to analyze businesses’ interactions for the first time. We find that API users are significantly more likely to automate tasks with Claude than consumers are, which suggests that major labor market implications could be on the horizon.

We summarize the report below. In addition, we’ve designed an interactive website where you can explore our data yourself. For the first time, you can search for trends and results in Claude.ai use across every US state and all occupations we track, to see how AI is used where you live or by people in similar jobs. Finally, if you’d like to build on our analysis, we’ve made our dataset openly available, alongside the data from our previous Economic Index reports.

Geography

We’ve expanded the Anthropic Economic Index to include geographic data. Below we cover what we’ve learned about how Claude is used across countries and US states.

Across countries

The US uses Claude far more than any other nation. India is in second place, followed by Brazil, Japan, and South Korea, each with similar shares.

Leading countries in terms of global Claude.ai use share.

However, there is huge variation in population size across these countries. To account for this, we adjust each country’s share of Claude.ai use by its share of the world’s working population. This gives us our Anthropic AI Usage Index, or AUI. Countries with an AUI greater than 1 use Claude more often than we’d expect based on their working-age population alone, and vice-versa.

The twenty countries that score highest on our Anthropic AI Usage Index: Israel, Singapore, Australia, New Zealand, and South Korea are the top five. — The twenty countries that score highest on our Anthropic AI Usage Index.

From the AUI data, we can see that some small, technologically advanced countries (like Israel and Singapore) lead in Claude adoption relative to their working-age populations. This might to a large degree be explained by income: we found a strong correlation between GDP per capita and the Anthropic AI Usage Index (a 1% higher GDP per capita was associated with a 0.7% higher AUI). This makes sense: the countries that use Claude most often generally also have robust internet connectivity, as well as economies oriented around knowledge work rather than manufacturing. But it does raise a question of economic divergence: previous general-purpose technologies, like electrification or the combustion engine, led to both vast economic growth and a great divergence in living standards around the world. If the effects of AI prove to be largest in richer countries, this general-purpose technology might have similar economic implications.

Graph showing that Claude use per capita is positively correlated with income per capita across countries. — Claude use per capita is positively correlated with income per capita across countries. (Axes are on a log scale.)

Patterns within the United States

The link between per capita GDP and per capita use of Claude also holds when comparing between US states. In fact, use rises more quickly within income here than across countries: a 1% higher per capita GDP inside the US is associated with a 1.8% higher population-adjusted use of Claude. That said, income actually has less explanatory power within the US than across countries, as there’s much higher variance within the overall trend. That is: other factors, beyond income, must explain more of the variation in population-adjusted use.

What else could explain this adoption gap? Our best guess is that it’s differences in the composition of states’ economies. The highest AUI in the US is the District of Columbia (3.82), where the most disproportionately frequent uses of Claude are editing documents and searching for information, among other tasks associated with knowledge work in DC. Similarly, coding-related tasks are especially common in California (the state with the third-highest AUI overall), and finance-related tasks are especially common in New York (which comes in fourth).¹ Even among states with lower population-adjusted use of Claude, like Hawaii, use is closely correlated to the structure of the economy: Hawaiians request Claude’s assistance for tourism-related tasks at twice the rate of the rest of America. Our interactive website contains plenty of other statistics like these.

Graph showing US states’ Claude adoption relative to their working age populations, with Utah and DC in the lead. — US states’ Claude adoption relative to their working age populations.

Trends in Claude use

We’ve been tracking how people use Claude since December 2024. We use a privacy-preserving classification method that categorizes anonymized conversation transcripts into task groups defined by O*NET, a US government database that classifies jobs and the tasks associated with them.² By doing this, we can analyze both how the tasks that people give Claude have changed since last year, and how the ways people choose to collaborate—how much oversight and input into Claude’s work they choose to have—have changed too.

Tasks

Since December 2024, computer and mathematical uses of Claude have predominated among our categories, representing around 37-40% of conversations.

But a lot has changed. Over the past nine months, we’ve seen consistent growth in “knowledge-intensive” fields. For example, educational instruction tasks have risen by more than 40 percent (from 9% to 13% of all conversations), and the share of tasks associated with the physical and social sciences has increased by a third (from 6% to 8%). In the meantime, the relative frequency of traditional business tasks has declined: management-related tasks have fallen from 5% of all conversations to 3%, and the share of tasks related to business and financial operations has halved, from 6% to 3%. (In absolute terms, of course, the number of conversations in each category has still risen significantly.)

Changes in Claude use over time, showing increases in use for scientific and educational tasks, and decreases for arts, business, and architecture uses. — Changes in Claude use over time, showing increases in use for scientific and educational tasks.

The overall trend is noisy, but generally, as the GDP per capita of a country increases, the use of Claude shifts away from tasks in the Computer and Mathematical occupation group, and towards a diverse range of other activities, like education, art and design; office and administrative support; and the physical and social sciences. Compare the trend line in the first graph below to the remaining three:

Occupation group shares vs. the Anthropic AI usage index, for computer and mathematical, educational instruction, arts, and office and administrative tasks. — As we move from lower to higher adoption countries, Claude use appears to shift to a more diverse mix of tasks, although the overall pattern is noisy.

All that said, software development remains the most common use in every single country we track. The picture looks similar in the US, although our sample size limits our ability to explore in more detail how the task mix varies with adoption rates.

Patterns of interaction

As we’ve discussed previously, we generally distinguish between tasks that involve automation (in which AI directly produces work with minimal user input) and augmentation (in which the user and AI collaborate to get things done). We further break automation down into directive and feedback loop interactions, where directive conversations involve the minimum of human interaction, and in feedback loop tasks, humans relay real-world outcomes back to the model. We also break augmentation down into learning (asking for information or explanations), task iteration (working with Claude collaboratively), and validation (asking for feedback).

Since December 2024, we’ve found that the share of directive conversations has risen sharply, from 27% to 39%. The shares of other interaction patterns (particularly learning, task iteration, and feedback loops) have fallen slightly as a result. This means that for the first time, automation (49.1%) has become more common than augmentation (47%) overall. One potential explanation for this is that AI is rapidly winning users’ confidence, and becoming increasingly responsible for completing sophisticated work.

This could be the result of improved model capabilities. (In December 2024, when we first collected data for the Economic Index, the latest version of Claude was Sonnet 3.6.) As models get better at anticipating what users want and at producing high-quality work, users are likely more willing to trust the model’s outputs at the first attempt.

Graphs showing automation overtaking augmentation from our first to third Index reports. — Automation appears to be increasing over time.

Perhaps surprisingly, in countries with higher Claude use per capita, Claude’s uses tend towards augmentation, whereas people in lower-use countries are much more likely to prefer automation. Controlling for the mix of tasks in question, a 1% increase in population-adjusted use of Claude is correlated with a roughly 3% reduction in automation. Similarly, increases in population-adjusted Claude use are associated with a shift away from automation (as in the chart below), not towards.

We’re not yet sure why this is. It could be because early adopters in each country feel more comfortable allowing Claude to automate tasks, or it could be down to other cultural and economic factors.

Graph showing that countries with higher Claude use per capita tend to have a lower share of automated tasks.. — Countries with higher Claude use per capita tend to use Claude in a more collaborative manner.

Businesses

Using the same privacy-preserving methodology we use for conversations on Claude.ai, we have begun sampling interactions from a subset of Anthropic’s first-party API customers, in a first-of-its-kind analysis.³ API customers, who tend to be businesses and developers, use Claude very differently to those who access it through Claude.ai: they pay per token, rather than a fixed monthly subscription, and can make requests through their own programs.

These customers’ use of Claude is especially concentrated in coding and administrative tasks: 44% of the API traffic in our sample maps to computer or mathematical tasks, compared to 36% of tasks on Claude.ai. (As it happens, around 5% of all API traffic focuses specifically on developing and evaluating AI systems.) This is offset by a smaller proportion of conversations related to educational occupations (4% in the API relative to 12% on Claude.ai), and arts and entertainment (5% relative to 8%).

We also find that our API customers use Claude for task automation much more often than Claude.ai users. 77% of our API conversations show automation patterns, of which the vast majority are directive, while just 12% show augmentation. On Claude.ai, the split is almost even. This could have significant economic implications: in the past, the automation of tasks has been associated with large economic transitions, as well as major productivity gains.

Graph showing a much higher share of augmentative uses on Claude.ai than the API, and vice-versa for automative uses. — Augmentation and automation with Claude on Claude.ai vs. the API.

Finally, given how API use is paid for, we can also explore whether differences in the cost of tasks (caused by differences in the number of tokens they consume) affect which tasks businesses choose to “buy”. Here, we find a positive correlation between price and use: higher-cost task categories tend to see more frequent use, as in the graph below. This suggests to us that fundamental model capabilities, and the economic value generated by the models, matters more to businesses than the cost of completing the task itself.

Graph showing occupational categories' usage share vs. average API cost. — Cost per task plotted against the task category’s share of total conversations.

Conclusion

The Economic Index is designed to provide an early, empirical assessment of how AI is affecting people’s jobs and the economy. What have we found so far?

Across each of the measures we cover in this report, the adoption of AI appears remarkably uneven. People in higher-income countries are more likely to use Claude, more likely to seek collaboration rather than automation, and more likely to pursue a breadth of uses beyond coding. Within the US, AI use seems to be strongly influenced by the dominant industries in local economies, from technology to tourism. And businesses are more likely to entrust Claude with agency and autonomy than consumers are.

Beyond the fact of unevenness, it’s especially notable to us that directive automation has become much more common in conversations on Claude.ai over the past nine months. The nature of people’s use of Claude is evidently still being defined: we’re still collectively deciding how much confidence we have in AI tools, and how much responsibility we should give them. So far, though, it looks like we’re becoming increasingly comfortable with AI, and willing to let it work on our behalf. We’re looking forward to revisiting this analysis over time, to see where—or, indeed, if—users’ choices settle as AI models improve.

If you’d like to explore our data yourself, you can do so on our dedicated Anthropic Economic Index website, which contains interactive visualizations of our country, state, and occupational data. We’ll update this website with more data in future, so you can continue to track the evolution of AI’s effects on jobs and the economy in the ways that interest you.

Our full report is available here. We hope it helps policymakers, economists and others more effectively prepare for the economic opportunities and risks that AI provides.

Open data

As with our past reports, we’re releasing a comprehensive dataset for this release, including geographic data, task-level use patterns, automation/augmentation breakdowns by task, and an overview of API use. Data are available for download at the Anthropic Economic Index website.

Work with us

If you’re interested in working at Anthropic to help build the systems powering this research, we encourage you to apply for our Research Engineer role.

Source link