Scientists call for rules on evaluating predictive AI in medicine
Illustration: Rebecca Zisser/Axios
Some scientists are calling on the Food and Drug Administration to establish standards for advanced algorithms that are developing at a "staggering" pace before they are put in medical devices to help predict patients' outcomes.
What's new: Advanced algorithms are starting to be deployed in some devices to help provide automated real-time predictions, but these offer a whole new level of possibilities and challenges from older predictive tools. Standards are needed to check for safety and effectiveness before they are implemented in a clinical setting, the scientists say in a policy forum in Science Thursday.
The FDA tells Axios it is working on developing a framework to handle advances in AI and medicine, as pointed out by Commissioner Scott Gottlieb last year. While unable to comment on this paper, a spokesperson says the FDA has used its current process for novel medical devices to authorize these AI algorithms:
- Viz.ai for helping providers detect stroke in CT scans.
- IDx-DR for detecting diabetic retinopathy.
- OsteoDetect for detecting bone fractures.
Meanwhile, Ravi B. Parikh, co-author of the paper and a fellow at University of Pennsylvania's School of Medicine, tells Axios that the FDA needs to set standards to evaluate the "staggering" pace of AI development. He adds:
"Five years ago, AI and predictive analytics had yet to make a meaningful impact in clinical practice. In just the past 2-3 years, premarket clearances have been granted for AI applications ranging from sepsis prediction to radiology interpretation."
"But if these tools are going to be used to determine patient care ... they should meet standards of clinical benefit just as the majority of our drugs and diagnostic tests do. We think being proactive in creating and formalizing these standards is essential to protecting patients and safely translating algorithms to clinical interventions."
Why it matters: Advanced algorithms present both opportunities and challenges, says Amol S. Navathe, co-author and assistant professor at Penn's School of Medicine. He tells Axios:
"The real opportunity is that these algorithms outperform clinicians in medical decisions, not a small feat. The challenge is that the data generated for algorithms is not randomly generated, rather, most of [what] the data algorithms 'see' is a result of a human decision. We have a ways to go in our scientific approaches to overcome this challenge and uniformly develop algorithms that can help improve upon human clinician decisions."
Details: The authors list the following as recommended standards...
- Meaningful endpoints for clinical benefit from the algorithms should be rigorously validated by the FDA, such as downstream outcomes like overall survival or clinically relevant metrics like the number of misdiagnoses.
- Appropriate benchmarks should be determined, similar to the recent example of the FDA approving Viz.AI, the deep-learning algorithm for diagnosing strokes, after it was able to diagnose strokes on computed tomography imaging more rapidly than neuroradiologists.
- Variable input specifications should be clarified for all institutions, such as defining inputs for electronic health records so results are reliable across institutions. Plus, algorithms should be trained on data sources from as broadly representative populations as possible so they are generalizable across all populations.
- Guidance on possible interventions that would be connected to an algorithm's findings to improve patient care should be considered.
- Run rigorous audits after FDA clearance or approval, particularly to check on how the new variables included by deep-learning may have altered its performance over time. For instance, regular audits could find the algorithm had a systematic bias against certain groups after being deployed across large populations. This could be tracked in a manner similar to the current FDA Sentinel Initiative program for approved drugs and devices.
Outside comment: Eric Topol, founder and director of Scripps Research Translational Institute, who was not part of this paper, says the timing of these proposed standards is "very smart" before advanced algorithms are placed into too many devices.
- "[The algorithm] doesn't translate necessarily into helping people," Topol tells Axios. "It can actually have no benefit."
- Even worse, he adds, if the variables are off, the predictive analyses can have negative ramifications.
What's next: The scientists hope the FDA considers integrating the proposed standards alongside its current pre-certification program under the Digital Health Innovation Act to study clinical outcomes of AI-based tools, Ravi says.