AI Research

Beyond ‘we used ChatGPT’: a new way to declare AI in research

Published

4 days ago

September 11, 2025

Recording which tasks were done when builds trust and transparency, say Yana Suchikova and colleagues

Academics are making ever-more use of generative artificial intelligence. But while large language models such as ChatGPT or Claude are enlisted for tasks such as drafting abstracts or supporting literature reviews, institutions, funders and publishers are still struggling to keep pace.

Many journals now ask authors to declare whether AI has been used, yet the resulting statements are often vague or unhelpful. A line such as “we used ChatGPT to improve clarity” leaves readers, reviewers and editors uncertain about the extent of AI’s involvement.

The problem is not entirely new. Modern science is collaborative, with research teams often numbering in the dozens. This makes it difficult to see who contributed what, and who deserves credit for it.

The most widely used solution so far is the CRediT taxonomy, which sets out 14 possible roles for human contributors, from conceptualisation to supervision. More than 50 organisations and publishers have adopted CRediT as a way to improve transparency.

While CRediT applies only to people, other frameworks exist for AI. The AI Use Taxonomy from the US National Institute of Standards and Technology describes 16 categories of activity, including monitoring, prediction and content creation. It is useful as a general map of AI’s capacities, but it does not match the step-by-step, collaborative nature of most research projects.

Delegation declaration

In a recent paper, we present a framework designed specifically for science and aimed at accounting for AI use in a way that fits how research is done. Called the Generative AI Delegation Taxonomy (GAIDeT), its focus is not authorship, which AI cannot assume, but delegation (which tasks, at which stages, were delegated to AI?).

At the macro level, GAIDeT covers the main stages of a research project: conceptualisation, literature review, methodology, software development, data management, writing, ethics review and supervision. At the micro level, it breaks these down into concrete tasks: generating hypotheses, cleaning data, translating text, checking for bias or preparing a cover letter for a submission. For each, the principle is the same: AI can assist, but the researcher remains accountable.

To make the taxonomy usable, we also built a freely available GAIDeT Declaration Generator. This collects basic information from authors on their use of AI, including the stage of the project, the specific task and the tool used. It then produces a statement that can be pasted into a manuscript, ideally just below the CRediT contribution list. The goal is to remove ambiguity while avoiding extra burden.

Why does this matter? First, because it provides a common language. Editors and reviewers no longer have to guess what “help with clarity” means. Readers can see precisely where AI was involved. Funders and institutions can assess whether its use meets their standards. Second, it helps researchers themselves reflect more carefully on how they use these tools.

Grey areas

Challenges remain. One is how to capture proportional contributions. If the AI generates the first draft of a paragraph but a human rewrites it, should the degree of delegation be expressed, and if so how?

Another issue is whether disclosure should identify which author used the AI. In a 10-author paper, does it matter if only one person used a model extensively? Or should the paper as a whole be treated as a collective product, with collective responsibility? Clearer guidance on how responsibility is assigned is still needed.

There is also the question of placement. Some journals put AI disclosures in acknowledgements, others in methods. Neither is ideal: acknowledgements are traditionally reserved for people, while methods may not cover all the uses of AI in writing. Our proposal is a dedicated section where AI contributions are described with the same clarity as human ones.

GAIDeT does not claim to settle these debates, and it will need updating as AI evolves and research practices change. Early feedback from colleagues suggests it can be useful for journals, repositories, funders and institutions trying to build policies around responsible AI use. We are keen for researchers, editors and policymakers to test the declaration generator and share comments.

Sharing builds standards

Transparency and disclosure of AI use will not eliminate all its risks. Authors intent on hiding misuse are unlikely to own up. But for the vast majority wanting to act responsibly, GAIDeT offers a way to show it.

In that sense, it is more a trust-building tool than an enforcement mechanism. Like conflict-of-interest statements or funding disclosures, its power lies in making openness routine.

The debate about AI in research is often polarised: some call for bans, others for full integration. GAIDeT offers a middle path, where AI’s contributions are neither ignored nor exaggerated but documented in a clear and structured way. That may not sound dramatic, but in science, shared language and shared standards are often the most effective drivers of integrity.

Yana Suchikova and Natalia Tsybuliak are at Berdyansk State Pedagogical University, Ukraine. Jaime Teixeira da Silva is an independent researcher based in Ikenobe, Japan. Serhii Nazarovetsis at Borys Grinchenko Kyiv Metropolitan University, Ukraine.

The authors used ChatGPT-5 for translation, proofreading and editing in their draft of this article.

Source link

AI Research

OpenAI makes $300 billion gamble on Oracle computing power to expand artificial intelligence capacity

Published

10 minutes ago

September 15, 2025

Wayne Williams

OpenAI signs $300 billion Oracle contract starting in 2027 to expand AI capacity
Oracle shares jump over 40 percent after reporting $317 billion in future revenue
Deal raises risks as OpenAI loses money and Oracle takes on heavy debt

OpenAI has signed a contract with Oracle to buy $300 billion worth of computing power over the next five years, according to the Wall Street Journal.

This makes it one of the largest cloud deals ever struck.

The contract will begin in 2027 and is expected to reshape how OpenAI builds and runs its artificial intelligence models.

A huge gamble

The agreement will require 4.5 gigawatts of power capacity, which is enough electricity to supply about four million homes.

It shows how the rush to build AI data centers is driving new highs in technology spending even as questions remain over whether demand will justify such commitments.

Oracle disclosed in its latest earnings report that it added $317 billion in future contract revenue during the quarter ending August 31, partly due to the OpenAI deal.

The news sent Oracle shares soaring by more than 40 percent in a single day. That surge increased Oracle Chairman Larry Ellison’s wealth by more than $100 billion, and saw him overtake Elon Musk as the world’s richest person with a net worth close to $400 billion.

The deal is not without massive risk for both parties, however.

For OpenAI, the agreement provides a new source of computing power after years of relying exclusively on Microsoft’s Azure cloud, but WSJ says the company, which reported about $10 billion in revenue this year, will owe Oracle an average of $60 billion annually under the agreement.

The startup is losing money and has told investors it does not expect to turn a profit until 2029.

Oracle, meanwhile, will have to borrow heavily to finance the AI chips and infrastructure needed to deliver the contract.

Plus, as WSJ reported, “The deal rests on the assumption ChatGPT will continue its explosive growth and be adopted by billions of people across the world, as well as major businesses and governments.”

Industry analysts say the partnership underlines both the promise and the strain of the AI boom. Spending on chips, servers, and data centers worldwide is projected to reach $2.9 trillion by 2028.

Whether OpenAI’s growth can keep pace with its commitments remains an open question.

Source link

AI Research

Artificial Intelligence Technology Solutions Inc. Announces Commercial Availability of Radcam Enterprise

Published

34 minutes ago

September 15, 2025

S&P Capital IQ

Artificial Intelligence Technology Solutions Inc. along with its subsidiary, Robotic Assistance Devices Inc. (RAD-I), announced the commercial availability of RADCam? Enterprise, a proactive video security platform now compatible with the industry’s leading Video Management Systems (VMS). The intelligent talking camera can be integrated quickly and seamlessly into virtually any professional-grade video system.

The Company first introduced the RADCam Enterprise initiative on May 5, 2025, highlighting its expansion beyond residential applications into small medium business (SMB) and enterprise markets. With today’s availability, RAD-I will deliver the solution through an untapped niche in the security industry, specifically security system integrators and security system distributors. RADCam Enterprise brings an intelligent “operator in the box” capability, enabling immediate talk-down to potential threats before human intervention is required.

The device integrates a speaker, microphone, and high-intensity lighting, allowing it not only to record but also to actively engage. At the same time, the solution is expected to deliver gross margins consistent with the Company’s established benchmarks. RADCam Enterprise distinguishes itself from the original residential version of RADCam by integrating RAD’s agentic AI platform, SARA (Speaking Autonomous Responsive Agent) as well as being compatible with RADSoC and industry leading Video Management Systems. RADCam Enterprise is available immediately through RAD-I’s network of channel partners and distributors.

Pre-orders are open at giving clients the opportunity to be among the first to deploy the solution. Designed for broad use across industries including logistics, retail, education, and commercial real estate, RADCam Enterprise provides clients and integrators with new ways to modernize security operations using proven AI-driven tools. RAD delivers these cost savings via a suite of stationary and mobile robotic solutions that complement, and at times, directly replace the need for human personnel in environments better suited for machines.

All RAD technologies, AI-based analytics and software platforms are developed in-house. The Company’s operations and internal controls have been validated through successful completion of its SOC 2 Type 2 audit, which is a formal, independent audit that evaluates a service organization’s internal controls for handling customer data and determines if the controls are not only designed properly but also operating effectively to protect customer data. Each Fortune 500 client has the potential of making numerous orders over time.

AITX is an innovator in the delivery of artificial intelligence-based solutions that empower organizations to gain new insight, solve complex challenges and fuel new business ideas. Through its next-generation robotic product offerings, AITX’s RAD, RAD-R, RAD-M and RAD-G companies help organizations streamline operations, increase ROI, and strengthen business. The Company has no obligation to provide the recipient with additional updated information.

No information in this publication should be interpreted as any indication whatsoever of the Company’s future revenues, results of operations, or stock price.

Source link

AI Research

Stanford Develops Real-World Benchmarks for Healthcare AI Agents

Published

1 hour ago

September 15, 2025

The Editors

Beyond the hype and hope surrounding the use of artificial intelligence in medicine lies the real-world need to ensure that, at the very least, AI in a healthcare setting can carry out tasks that a doctor would in electronic health records.

Creating benchmark standards to measure that is what drives the work of a team of Stanford researchers. While the researchers note the enormous potential of this new technology to transform medicine, the tech ethos of moving fast and breaking things doesn’t work in healthcare. Ensuring that these tools are capable of doing these tasks is vital, and then they can be used as tools that augment the care clinicians provide every day.

“Working on this project convinced me that AI won’t replace doctors anytime soon,” said Kameron Black, co-author on the new benchmark paper and a Clinical Informatics Fellow at Stanford Health Care. “It’s more likely to augment our clinical workforce.”

MedAgentBench: Testing AI Agents in Real-World Clinical Systems

Black is one of a multidisciplinary team of physicians, computer scientists, and researchers from across Stanford University who worked on the new study, MedAgentBench: A Virtual EHR Environment to Benchmark Medical LLM Agents, published in the New England Journal of Medicine AI.

Although large language models (LLMs) have performed well on the United States Medical Licensing Examination (USMLE) and at answering medical-related questions in studies, there is currently no benchmark testing how well LLMs can function as agents by performing tasks that a doctor would normally do, such as ordering medications, inside a real-world clinical system where data input can be messy.

Unlike chatbots or LLMs, AI agents can work autonomously, performing complex, multistep tasks with minimal supervision. AI agents integrate multimodal data inputs, process information, and then utilize external tools to accomplish tasks, Black explained.

Overall Success Rate (SR) Comparison of State-of-the-Art LLMs on MedAgentBench
Model	Overall SR
Claude 3.5 Sonnet v2	69.67%
GPT-4o	64.00%
DeepSeek-V3 (685B, open)	62.67%
Gemini-1.5 Pro	62.00%
GPT-4o-mini	56.33%
o3-mini	51.67%
Qwen2.5 (72B, open)	51.33%
Llama 3.3 (70B, open)	46.33%
Gemini 2.0 Flash	38.33%
Gemma2 (27B, open)	19.33%
Gemini 2.0 Pro	18.00%
Mistral v0.3 (7B, open)	4.00%

While previous tests only assessed AI’s medical knowledge through curated clinical vignettes, this research evaluates how well AI agents can perform actual clinical tasks such as retrieving patient data, ordering tests, and prescribing medications.

“Chatbots say things. AI agents can do things,” said Jonathan Chen, associate professor of medicine and biomedical data science and the paper’s senior author. “This means they could theoretically directly retrieve patient information from the electronic medical record, reason about that information, and take action by directly entering in orders for tests and medications. This is a much higher bar for autonomy in the high-stakes world of medical care. We need a benchmark to establish the current state of AI capability on reproducible tasks that we can optimize toward.”

The study tested this by evaluating whether AI agents could utilize FHIR (Fast Healthcare Interoperability Resources) API endpoints to navigate electronic health records.

The team created a virtual electronic health record environment that contained 100 realistic patient profiles (containing 785,000 records, including labs, vitals, medications, diagnoses, procedures) to test about a dozen large language models on 300 clinical tasks developed by physicians. In initial testing, the best model, in this case, Claude 3.5 Sonnet v2, achieved a 70% success rate.

“We hope this benchmark can help model developers track progress and further advance agent capabilities,” said Yixing Jiang, a Stanford PhD student and co-author of the paper.

Many of the models struggled with scenarios that required nuanced reasoning, involved complex workflows, or necessitated interoperability between different healthcare systems, all issues a clinician might face regularly.

“Before these agents are used, we need to know how often and what type of errors are made so we can account for these things and help prevent them in real-world deployments,” Black said.

What does this mean for clinical care? Co-author James Zou and Dr. Eric Topol claim that AI is shifting from a tool to a teammate in care delivery. With MedAgentBench, the Stanford team has shown this is a much more near-term reality by showcasing several frontier LLMs in their ability to carry out many day-to-day clinical tasks that a physician would perform.

Already the team has noticed improvements in performance of the newest versions of models. With this in mind, Black believes that AI agents might be ready to handle basic clinical “housekeeping” tasks in a clinical setting sooner than previously expected.

“In our follow-up studies, we’ve shown a surprising amount of improvement in the success rate of task execution by newer LLMs, especially when accounting for specific error patterns we observed in the initial study,” Black said. “With deliberate design, safety, structure, and consent, it will be feasible to start moving these tools from research prototypes into real-world pilots.”

The Road Ahead

Black says benchmarks like these are necessary as more hospitals and healthcare systems are incorporating AI into tasks including note-writing and chart summarization.

Accurate and trustworthy AI could also help alleviate a looming crisis, he adds. Pressed by patient needs, compliance demands, and staff burnout, healthcare providers are seeing a worsening global staffing shortage, estimated to exceed 10 million by 2030.

Instead of replacing doctors and nurses, Black hopes that AI can be a powerful tool for clinicians, lessening the burden of some of their workload and bringing them back to the patient bedside.

“I’m passionate about finding solutions to clinician burnout,” Black said. “I hope that by working on agentic AI applications in healthcare that augment our workforce, we can help offload burden from clinicians and divert this impending crisis.”

Paper authors: Yixing Jiang, Kameron C. Black, Gloria Geng, Danny Park, James Zou, Andrew Y. Ng, and Jonathan H. Chen

Read the piece in the New England Journal of Medicine AI.

Source link