AI guardrails can fall short in health care: study
When physicians use artificial intelligence tools with baked-in systemic bias to help figure out what's wrong with patients, it's perhaps little surprise they're apt to make less accurate diagnoses.
- But a common safeguard against potential bias — transparency about how the AI came to form its predictions — doesn't help mitigate that problem, a new JAMA study finds.
Why it matters: With AI poised to play a greater role in diagnosis and treatment, there's growing emphasis on rooting out models developed with faulty assumptions.
- For instance, if an AI model is trained on data in which female patients are consistently underdiagnosed for heart disease, that data might learn to underdiagnose females, the researchers point out.
Details: In the study, about 450 doctors, nurses and physician assistants were shown a handful of cases of patients hospitalized with acute respiratory failure.
- The clinicians were given patients' presenting symptoms, information from their physical examinations, laboratory results and chest radiographs and asked to determine the likelihood of pneumonia, heart failure or chronic obstructive pulmonary disease.
- They were all shown two cases that had no input from an AI model to create a baseline. They were then randomized to see six more cases with AI model input, including three that included systematically biased model predictions.
Zoom in: Clinicians' diagnosis accuracy on their own was 73%.
- When the clinicians had been shown a prediction from an unbiased AI model, their accuracy improved by 2.9 percentage points.
- And when they were given an explanation of how that AI model reached its prediction, their accuracy was 4.4 percentage points higher than the baseline.
- That's indicative of how these tools — by pulling together data from patient health history, including lab results and imaging — have the promise to make clinicians better.
However, when the clinicians were given predictions from intentionally biased AI models, their accuracy dropped t0 61.7% — 11.3 percentage points lower than when they made diagnoses without any AI input.
- Even when they were given explanations about what factors the models considered, which included irrelevant information, their diagnosis accuracy rate was still 9.1 percentage points below baseline.
- For example, the flawed model was biased to say overweight people had heart failure. The AI explanation highlighted how the body fat tissue is what the model used to make its decision, Michael Sjoding, a University of Michigan researcher who co-authored the study, told Axios.
The big picture: AI can be a transformational tool, but it can also be a harmful one.
- "This should be a big 'proceed with caution' sign," Sjoding said.
Transparency into clinical AI algorithms can help protect against potential harms, but isn't a cure-all.
- Federal regulators announced last week new rules requiring developers to offer more transparency into how AI tools for clinical decision-making are trained.
- It came just weeks after a White House executive order on AI pushing standards for the technology's use, including guardrails against bias.
- "It's a nice premise, this idea that, if the clinician is using the model, we want to try to help the clinician understand how the AI model is making its decision," Sjoding said. "This study suggests that while that's a good aim for an AI model to have, it's not necessarily going to mitigate a potential, systemic problem."