AI Insights

A practical framework for appropriate implementation and review of artificial intelligence (FAIR-AI) in healthcare

Published

1 month ago

August 11, 2025

Best practices and key considerations—narrative review

As a first step to inform the construct of FAIR-AI, we conducted a narrative review to identify the best practices and key considerations related to responsibly deploying AI in healthcare, these are summarized in Table 1. The results are organized into several themes including validation, usefulness, transparency, and equity.

Table 1 Best practices and key considerations in implementation of artificial intelligence

Numerous publications and guidelines such as TRIPOD and TRIPOD-AI have described the reporting necessary to properly evaluate a risk prediction model, regardless of the underlying statistical or machine learning method^12,13. An important consideration in model validation is careful selection of performance metrics¹⁴. Beyond discrimination metrics like AUC, it is important to assess other aspects of model performance, such as calibration, and the F-score, which is particularly useful in settings with imbalanced data. For models that produce a continuous risk, probability decision thresholds can be adjusted to maximize classification measures such as positive predictive value (PPV) depending on the specific clinical scenario. Decision Curve Analysis can help evaluate the tradeoff between true positives and false positives to determine whether a model offers practical value at a given clinical threshold¹⁵. For regression problems, besides Mean Square Error (MSE), other metrics such as Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) can also be examined¹⁶. It is important to establish a model’s real-world applicability through dedicated validation studies^17,18. The strength of evidence supporting validation and minimum performance standards should align with the intended use case, its potential risks, and the likelihood of performance variability once deployed based on the analytic approach or data sources (Supplementary Fig 1)^14,17,18. Applying these traditional standards to evaluate the validity of generative AI models is uniquely challenging and frequently not possible. While the literature in this area is nascent, evaluation should still be performed and may require qualitative metrics such as user feedback and expert reviews, which can provide insights into performance, risks, and usefulness^19,20.

Deploying and maintaining AI solutions in healthcare requires significant resources and carries the potential for both risk and benefits, making it essential to evaluate whether a tool delivers actual usefulness, or a net benefit, to the organization, clinical team, and patients^21,22. Decision analyses can quantify the expected value of medical decisions, but they often require detailed cost estimates and complex modeling. Formal net benefit calculations simplify this process by integrating the relative value of benefits versus harms into a single metric^18,23. However, a lack of objective data, the specific context, or the nature of the solution may render these calculations impractical. In these cases, net benefit provides a construct to guide qualitative discussions among subject matter experts, helping to weigh benefits and risks while considering workflows that mitigate risks. Additionally, a thorough assessment of clinical utility may require an impact study to evaluate a solution’s effects on factors such as resource utilization, time savings, ease of use, workflow integration, end-user perception, alert characteristics (e.g., mode, timing, and targets), and unintended consequences^9,22,24.

Given the potential for ethical and equity risks when deploying AI solutions in healthcare, transparency should be present to the degree that it is possible across all levels of the design, development, evaluation, and implementation of AI solutions to ensure fairness and accountability (https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf; http://data.europa.eu/eli/reg/2024/1689/oj)²⁵^,²⁶. Specifically due to the potential for AI to perpetuate biases that could result in over- or under-treatment of certain populations, there must be a clear and defensible justification for including predictor variables that have historically been associated with discrimination, such as those outlined in the PROGRESS-Plus framework: place of residence, race/ethnicity/culture/language, occupation, gender/sex, religion, education, socioeconomic status, social capital, and personal characteristics linked to discrimination (e.g., age, disability, sexual orientation)^21,27,28,29. This is particularly important when these variables may act as proxies for other, more meaningful determinants of health. It is equally important to evaluate for patterns of algorithmic bias by monitoring outcomes for discordance between patient subgroups, as well as ensuring equal access to the AI solution itself when applicable^10,25,30,31. Once an AI solution is implemented, transparency for end-users becomes a critical element for building trust and confidence, as well as empowering users to play a role in vigilance for potential unintended consequences. To achieve this post-implementation transparency, end-users should have information readily available that explains an AI solution’s intended use, limitations, and potential risks (https://www.fda.gov/medical-devices/software-medical-device-samd/transparency-machine-learning-enabled-medical-devices-guiding-principles)³². Transparency is also critical from the patient’s perspective. There is an ethical imperative to notify patients when AI is being used and, when appropriate, to obtain their consent—particularly in sensitive or high-stakes situations^33,34. This obligation is heightened when there is no human oversight, when the technology is experimental, or when the use of AI is not readily apparent. Failing to disclose the use of AI in such contexts may undermine patient autonomy and erode trust in the healthcare system. Generative AI presents unique challenges in terms of transparency. For example, deep learning relies on vast numbers of parameters drawn from increasingly large datasets and may be inherently unexplainable. When transparency is lacking there should be a greater emphasis on human oversight and education on limitations and risks, and this is an area of ongoing research²⁰.

Stakeholder needs and priorities—interviews

Several systematic reviews emphasize the importance of stakeholder engagement in the design and implementation of AI solutions in healthcare; however, this aspect is often overlooked in the existing frameworks^35,36. To create a practical and useful framework for health systems, we borrowed from user-centric design principles to first assess stakeholders’ priorities for an AI framework and their criteria for evaluating its successful implementation. We interviewed stakeholders including health system leaders, AI developers, providers, and patients. Our findings were previously presented at the 17^th Annual Conference on the Science of Dissemination and Implementation, hosted by AcademyHealth³⁷.

The stakeholders expressed multiple priorities for an AI framework, particularly the need for: (1) risk tolerance assessments to weigh the potential patient harms of an AI solution against expected benefits, (2) a human-in-the-loop of any medical decisions made using an AI solution, (3) consideration that available, rigorous evidence may be limited when reviewing new AI solutions, and (4) awareness that solutions may not have been developed on diverse patient populations or data similar to the population in which a use case is proposed. Interviewees also highlighted the importance of ensuring that AI solutions are matched to institutional priorities and conform to all relevant regulations. They noted regulations can pose unique challenges for large, multi-state health systems. While patient safety and outcomes were identified as paramount, stakeholders also detailed the need for an AI framework to evaluate the impact of potential solutions on health system employees.

When evaluating the successful implementation and utilization of an AI framework, stakeholders were consistent in explaining that the review process must operate in a timely manner, provide clear guidelines for AI developers, and ensure fair and consistent review processes that are applicable for both internally and externally developed solutions. Multiple interviewees cited the challenges presented by the rapid pace of AI innovation, expressing concerns that an overly bureaucratic and time-consuming review process could hinder the health system’s ability to keep pace with the wider healthcare market. Similarly, multiple senior leaders and AI developers explained that a successful AI framework would both encourage internal innovation and streamline the implementation of AI solutions in a safe manner.

Framework for the appropriate implementation and review of AI (FAIR-AI) in healthcare

Findings from stakeholder interviews informed our design workshop efforts, which included health system leaders and experts in AI, with workshop participants providing explicit guidance on how to best construct the FAIR-AI to meaningfully integrate stakeholder feedback. The project team leveraged design workshop activities and participant expertise to develop a set of requirements for health systems seeking to implement AI responsibly. FAIR-AI provides a detailed outline of: (i) foundational health system requirements—artifacts, personnel, processes, and tools; (ii) inclusion and exclusion criteria that specifically detail which AI solutions ought to be evaluated by FAIR-AI, thus defining scope and ensuring accountability; (iii) review questions in the form of a low-risk screening checklist and an in-depth review that provides a comprehensive evaluation of risk and benefits across the areas of development, validation, performance, ethics and equity, usefulness, compliance and regulations; (iv) discrete risk categories that map to the review criteria and are assigned to each AI solution and its intended use case; (v) safe implementation plans including monitoring and transparency requirements; (vi) an AI Label that consolidates information in an understandable format. These core components of FAIR-AI are also displayed in Fig. 1.

Fig. 1: Core components of FAIR-AI.

FAIR-AI aims towards safe innovation and responsible deployment with AI solutions in healthcare. This is achieved through a process centered around (1) comprehensive review organized by risk domains; (2) categorization of an AI solution into low, moderate, or high risk; and (3) Safe AI Plan consisting of monitoring and end-user transparency requirements.

Implementing a responsible AI framework requires that health systems have certain foundational elements in place: (i) artifacts include a set of guiding principles for AI implementation and an AI ethics statement (examples are shown in Supplementary Table 1), both of which should be endorsed at the highest level of the organization; (ii) personnel including an individual (or a team) with data science training who are accountable for reviews; (iii) a process for escalation to an institutional decision-making body with the multidisciplinary expertise needed to assess ethical, legal, technical, operational, and clinical implications, with the authority to act; and (iv) an inventory tool that serves as a single source of truth catalog that enables accountability for review, monitoring, and transparency requirements. It is important to establish that the AI evaluation framework does not replace but rather supports existing governance structure. Additionally, while the overarching structure of an AI governance framework like FAIR-AI may remain consistent over time, the rapid pace of change in technology and regulations requires a process for regular review and updating by subject matter experts.

As the first step in FAIR-AI, an AI solution needs to go through an intake process. Individual leaders who are responsible for the deployment of AI solutions within the enterprise are designated as business owners; for clinical solutions, the business owner is a clinical leader. In this framework, we require the business owner of an AI solution to provide a set of descriptive items through an intake form including: (i) existing problem to solve; (ii) clearly outlined intended use case; (iii) expected benefits; (iv) risks including worst-case scenario(s); (v) published and unpublished information on development, validation, and performance; and (vi) FDA approvals, if applicable.

Next, we describe the inclusion and exclusion criteria for AI solutions to be applicable to FAIR-AI. Based on the premise that enterprise risk management must cast a wide net to be aware of potential risks, the inclusion for FAIR-AI review starts with a broad, general definition of AI solutions, which intentionally also includes solutions that do not directly relate to clinical care. We adopted the definition of AI from Matheny et al., as “computer system(s) capable of activities normally associated with human cognitive effort”³⁸. We then provide additional scope specificity by excluding three general areas of AI. First, simple scoring systems and rules-based tools for which an end-user can reasonably be expected to evaluate and take responsibility for performance. Second, any physical medical device that also incorporates AI into its function, as there are well-established FDA regulations in place to evaluate and monitor risks associated with these devices (https://www.fda.gov/medical-devices/classify-your-medical-device/how-determine-if-your-product-medical-device). Third, any AI solution being considered under an Institutional Review Board (IRB)-approved research protocol that includes informed consent for the use of AI when human subjects are involved. Inclusion and exclusion criteria like these will need to be adapted to a health system’s local context.

Risk evaluation considers the magnitude and importance of adverse consequences from a decision; and in the case of FAIR-AI, the decision to implement a new AI solution³⁹. As there are numerous approaches and nomenclatures to define risk, local consensus on a clear definition is a critical initial step for a health system. We aimed for simplicity in our risk definition and the number of risk categories to ensure interpretability by diverse stakeholders. Additionally, we opted to pursue a qualitative determination of risk and avoid a purely quantitative, composite risk score approach. The requisite data rarely exist to perform such risk calculations reliably, and composites of weighted scores have the potential to dilute important individual risk factors as well as the nuance of risk mitigation offered by the workflows surrounding AI solutions (for example, requiring a human review of AI output before an action is taken). Thus, FAIR-AI determines the magnitude and importance of potential adverse effects through consensus between subject matter experts from a data science team, the business leader requesting the AI solution, and ad hoc consultation when additional expertise is needed. In this exercise, the group leverages published data and expert opinion to outline hypothetical worst-case scenarios and the harms that could occur as an indirect or direct result of output from the proposed AI solution. The consensus determines if those harms are minor, or not minor; and if not minor, are they sufficiently mitigated by the related implementation workflow and monitoring plan. This risk framework is like that proposed by the International Medical Device Regulators Forum (https://www.imdrf.org/documents/software-medical-device-possible-framework-risk-categorization-and-corresponding-considerations). It is important here to note that every AI solution should be reviewed within the context of its intended use case, which includes the surrounding implementation workflows.

As prioritized by our stakeholders, a responsible AI framework should be nimble enough to allow quick but thorough reviews of AI solutions that have a low chance of causing any harm to an individual or the organization. To that end, FAIR-AI incorporates a 2-step process: an initial low-risk screening pathway and a subsequent in-depth review pathway for all solutions that do not pass through the low-risk screen. For an AI solution to be designated low-risk, it must pass all the low-risk screening questions (Table 2). Should answers to any of the screening questions suggest potential risks, the AI solution moves on to an in-depth review guided by the questions presented in Table 3. The in-depth review involves closer scrutiny of the AI solution by the data scientist and business owner and mandates a higher burden of proof that the potential benefits of the solution outweigh the potential risks identified during the screening process. If any of the in-depth review questions results in a determination of high risk, then the solution is considered high risk. It is also possible that the discussion between the data scientist and business owner will lead to a better understanding of the solution that results in a change to the answers to one or more of the low-risk screening, resulting in a low-risk designation.

Table 2 Low-risk screening questions

Table 3 In-depth review questions

After the FAIR-AI review, which is described in detail in the next section, each AI solution is designated as low, moderate, or high risk according to the following definitions (Fig. 2):

Low risk: Potential adverse effects are expected to be minor and should be apparent to the end-user and business owner. No ethical, equity, compliance, or regulatory concerns were identified during a low-risk screen.
Moderate risk: Based on an in-depth review, one or more of the following are present: (1) potential adverse effects are not minor but are adequately addressed by workflows; (2) ethical, equity, compliance, or regulatory issues are suspected or present, but are appropriately mitigated.
High risk: Based on an in-depth review, one or more of the following are present: (1) potential adverse effects are notable and could have a significant negative impact on patients, teammates, individuals, or the enterprise; (2) ethical, equity, compliance, or regulatory issues suspected or present, but not adequately addressed; (3) insufficient evidence exists to recommend proceeding with implementation.

**Fig. 2: Risk categories as determined by FAIR-AI evaluation and escalation to AI Governance.**

For our health system, all AI solutions designated as high risk are escalated to the AI Governance committee where they undergo a multidisciplinary discussion. The discussion results in one of three final designations: (i) proceed to implementation under high-risk conditions; (ii) proceed to a pilot or research study; or (iii) do not proceed, implementation would create an intolerable risk for the organization.

The FAIR-AI framework is designed to encompass the full range of AI solutions in healthcare, including many that will not require in-depth review and can be designated low risk—such as those supporting back-office functions, cybersecurity, or administrative automation. Examples of moderate-risk AI tools in healthcare include solutions that support—but do not replace—clinical or administrative decision-making. These tools may influence patient care or documentation, but their outputs are generally explainable, subject to human review, and integrated into existing workflows that help mitigate risk. Examples of high-risk AI tools in healthcare include those that directly influence clinical care, diagnostics, or billing—particularly when used without consistent human oversight. They may also be deployed in sensitive contexts, such as end-of-life care or other high-stakes medical decisions. These tools often rely on complex, opaque models that can perpetuate bias, affect decision-making, and lead to significant downstream consequences if not rigorously validated and continuously monitored.

After application of the low-risk screening questions, the in-depth review questions (if necessary), and completion of the AI Governance committee review (if necessary), the proposed solution is assigned a final risk category, and a FAIR-AI Summary Statement is completed (an example is presented in Supplementary Box 1). At this point, an AI solution may need to go through other traditional governance requirements like a cyber security review, financial approvals, etc. If the AI solution ultimately is designated to move forward with implementation, then the data science team and business owners collaboratively develop a Safe AI Plan as outlined below.

The first component of the Safe AI Plan concerns monitoring requirements. Implemented AI solutions need continuous monitoring as they may fail to adapt to new data or practice changes, which can lead to inaccurate results and increasing bias over time^40,41. Similarly, when AI solutions are made readily available in workflows, it becomes easier for the solution to be used outside of its approved intended use case, which may change its inherent risk profile. For these reasons, FAIR-AI requires a monitoring plan for every deployed AI solution consisting of an attestation by the business owner at regular intervals. The attestation affirms that: (i) the deployment is still aligned with the approved use case; (ii) the underlying data and related workflows have not substantially changed; (iii) the AI solution is delivering the expected benefit(s); (iv) no unforeseen risks have been identified; and (v) there are no concerns noted related to new regulations. If the original FAIR-AI review identified specific risks, then the attestation also includes an approach to evaluate each risk along with metrics (if applicable). These evaluation metrics may range from repeating a standard model performance evaluation to obtaining periodic end-user feedback on accuracy (e.g., for a generative AI solution). The second component of the Safe AI Plan is transparency requirements. All solutions categorized as high risk also require an AI Label (Fig. 3) and end-user education at regular intervals. In situations where an end-user could potentially not be aware they are interacting with AI instead of a human, the business owner must also design implementation workflows that create transparency for the end-user (e.g., an alert, disclaimer, or consent as applicable).

Source link

Up Next

Artificial intelligence predicts hospital admissions hours earlier in emergency departments

Don't Miss

The limits of AI in the classroom

The Editors

Click to comment

AI Insights

Robinhood CEO says just like every company became a tech company, every company will become an AI company

Published

2 hours ago

September 13, 2025

The Editors

Earlier advances in software, cloud, and mobile capabilities forced nearly every business—from retail giants to steel manufacturers—to invest in digital transformation or risk obsolescence. Now, it’s AI’s turn.

Companies are pumping billions of dollars into AI investments to keep pace with a rapidly changing technology that’s transforming the way business is done.

Robinhood CEO Vlad Tenev told David Rubenstein this week on Bloomberg Wealth that the race to implement AI in business is a “huge platform shift” comparable to the mobile and cloud transformations in the mid-2000s, but “perhaps bigger.”

“In the same way that every company became a technology company, I think that every company will become an AI company,” he explained. “But that will happen at an even more accelerated rate.”

Tenev, who co-founded the brokerage platform in 2013, pointed out that traders are not just trading to make money, but also because they love it and are “extremely passionate about it.”

“I think there will always be a human element to it,” he added. “I don’t think there’s going to be a future where AI just does all of your thinking, all of your financial planning, all the strategizing for you. It’ll be a helpful assistant to a trader and also to your broader financial life. But I think the humans will ultimately be calling the shots.”

Yet, Tenev anticipates AI will change jobs and advised people to become “AI native” quickly to avoid being left behind during an August episode of the Iced Coffee Hour podcast. He added AI will be able to scale businesses far faster than previous tech booms did.

“My prediction over the long run is you’ll have more single-person companies,” Tenev said on the podcast. “One individual will be able to use AI as a huge accelerant to starting a business.”

Global businesses are banking on artificial intelligence technologies to move rapidly from the experimental stage to daily operations, though a recent MIT survey found that 95% of pilot programs failed to deliver.

U.S. tech giants are racing ahead, with the so-called hyperscalers planning to spend $400 billion on capital expenditures in the coming year, and most of that is going to AI.

Studies show AI has already permeated a majority of businesses. A recent McKinsey survey found 78% of organizations use AI in at least one business function, up from 72% in early 2024 and 55% in early 2023. Now, companies are looking to continually update cutting-edge technology.

In the finance world, JPMorgan Chase’s Jamie Dimon believes AI will “augment virtually every job,” and described its impact as “extraordinary and possibly as transformational as some of the major technological inventions of the past several hundred years: think the printing press, the steam engine, electricity, computing, and the Internet.”

Fortune Global Forum returns Oct. 26–27, 2025 in Riyadh. CEOs and global leaders will gather for a dynamic, invitation-only event shaping the future of business. Apply for an invitation.

Source link

AI Insights

California Lawmakers Once Again Challenge Newsom’s Tech Ties with AI Bill

Published

3 hours ago

September 13, 2025

The Editors

Last year, California Governor Gavin Newsom vetoed a wildly popular (among the public) and wildly controversial (among tech companies) bill that would have established robust safety guidelines for the development and operation of artificial intelligence models. Now he’ll have a second shot—this time with at least part of the tech industry giving him the green light. On Saturday, California lawmakers passed Senate Bill 53, a landmark piece of legislation that would require AI companies to submit to new safety tests.

Senate Bill 53, which now awaits the governor’s signature to become law in the state, would require companies building “frontier” AI models—systems that require massive amounts of data and computing power to operate—to provide more transparency into their processes. That would include disclosing safety incidents involving dangerous or deceptive behavior by autonomous AI systems, providing more clarity into safety and security protocols and risk evaluations, and providing protections for whistleblowers who are concerned about the potential harms that may come from models they are working on.

The bill—which would apply to the work of companies like OpenAI, Google, xAI, Anthropic, and others—has certainly been dulled from previous attempts to set up a broad safety framework for the AI industry. The bill that Newsom vetoed last year, for instance, would have established a mandatory “kill switch” for models to address the potential of them going rogue. That’s nowhere to be found here. An earlier version of SB 53 also applied the safety requirements to smaller companies, but that has changed. In the version that passed the Senate and Assembly, companies bringing in less than $500 million in annual revenue only have to disclose high-level safety details rather than more granular information, per Politico—a change made in part at the behest of the tech industry.

Whether that’s enough to satisfy Newsom (or more specifically, satisfy the tech companies from whom he would like to continue receiving campaign contributions) is yet to be seen. Anthropic recently softened on the legislation, opting to throw its support behind it just days before it officially passed. But trade groups like the Consumer Technology Association (CTA) and Chamber for Progress, which count among its members companies like Amazon, Google, and Meta, have come out in opposition to the bill. OpenAI also signaled its opposition to regulations California has been pursuing without specifically naming SB 53.

After the Trump administration tried and failed to implement a 10-year moratorium on states implementing regulations on AI, California has the opportunity to lead on the issue—which makes sense, given most of the companies at the forefront of the space are operating within its borders. But that fact also seems to be part of the reason Newsom is so shy to pull the trigger on regulations despite all his bluster on many other issues. His political ambitions require money to run, and those companies have a whole lot of it to offer.

Source link

AI Insights

Will Smith allegedly used AI in concert footage. We’re going to see a lot more of this…

Published

3 hours ago

September 13, 2025

The Editors

Earlier this month, footage was released of one of Will Smith’s gigs which was allegedly AI-generated.

Snopes agreed that the crowd shots featured ‘some AI manipulation’. You can watch the video below:

Will Smith is being accused of posting a video that features AI-generated shots of fans cheering in the crowd during his tour pic.twitter.com/1Zvmp1p8MgAugust 27, 2025

Eagle-eyed viewers who paused the footage spotted some telltale signs: namely, that the AI ‘fans’ in the video looked less like humans and more like, well, alien creatures in a horror movie who are desperate to suck out your soul. Their hands were elongated and had more fingers than the children of incestuous relationships, while their blurred facial features resembled melted candles in the shape of ghouls.

Nonetheless, it turns out the emotive slogans were real and were held by real Smith fans, such as Patric and Géraldine of Switzerland, who held up a sign saying “‘You Will Make It’ helped me survive cancer. Thx Will’. And to be fair to Smith, it appears that the massive crowds in the video were real: his team had merely used AI to turn still images into short videos.

Green Day laughed at Smith on Instagram, posting a shot of their fans at a gig with the caption: “Don’t need AI for our crowds”.

However, though his motive seems to be simply generating AI videos from stills, Smith’s is unlikely to be the last example we see of performers using AI footage of fans. Every music artist wants a full-to-bursting, over-emotional stadium crowd who are hysterical with joy at seeing their idol(s). So if you, unlike Smith, personally can’t get real footage of that, then why not fake it? (Probably because the internet is full of merciless, critical sleuths who are going to roast you until you’re a smoking heap of charred remains.)

Donald Trump’s team have allegedly paid extras to appear at his rallies to fill spare stadium seats, but that’s expensive and also risky as people might not show up – or, even worse for the team, Democrats might turn up. Generating AI footage is far cheaper, even if it burns trees.

You could also make your crowds as attractive, young, unisex, and ethnically diverse as you want – even if the pause button does reveal them to be more horrifying than the zombies in I Am Legend.