AI failed to detect critical health conditions: study
Add Axios as your preferred source to
see more of our stories on Google.

Illustration: Annelise Capossela/Axios
AI systems designed to predict the likelihood of a hospitalized patient dying largely aren't detecting worsening health conditions, a new study found.
Why it matters: Some machine learning models trained exclusively on existing patient data didn't recognize about 66% of injuries that could lead to patient death in the hospital, according to the research published in Nature's Communications Medicine journal.
State of play: Hospitals increasingly use tools that harness machine learning, a subset of AI that focuses on systems that continuously learn and adjust as they're given new data.
- A separate study recently published in Health Affairs found that about 65% of U.S. hospitals use AI-assisted predictive models, most commonly to figure out inpatient health trajectories.
Zoom in: Researchers looked at several machine learning models commonly cited in medical literature for use in predicting patient deterioration and fed them publicly available sets of data about the health and metrics of patients in ICUs or with cancer.
- The researchers then created test cases for the models to predict potential health issues and risk scores if some patient metrics were altered from the initial data set.
- The models for in-hospital mortality prediction could only recognize an average of 34% of patient injuries, the study found.
What they're saying: "We are asking the models to make big decisions, and so we really need to figure out ... in what kind of situations they can perform," said Danfeng (Daphne) Yao, an author of the study and a computer science professor at Virginia Tech.
- It's extremely important for technology being used in patient care decisions to incorporate medical knowledge, Yao said.
- The study shows that "purely data-driven training alone is not sufficient," she added.
What we're watching: Large language models — think ChatGPT-type AI systems — could be more useful in medical settings if they're trained on medical literature. But more research on their trustworthiness is needed before they're deployed in clinical settings, the study says.
