Jan 3, 2024 - Health

ChatGPT had a high error rate for pediatric cases

Illustration of AI elements coming out of a doctor's suit where their head should be.

Illustration: Maura Losch/Axios

Researchers found ChatGPT incorrectly diagnosed over 8 in 10 selected pediatric case studies, raising questions about some bots' suitability for helping doctors size up complex conditions.

The big picture: Large language models like OpenAI's ChatGPT are trained on massive amounts of internet data and can't discriminate between reliable and unreliable information, researchers at Cohen Children's Medical Center wrote.

  • They also lack real-time access to medical information, preventing them from staying updated on new research and health trends.

What they found: The chatbot misdiagnosed 72 of 100 cases selected and delivered too broad a diagnosis to be considered correct for another 11, the researchers wrote in JAMA Pediatrics.

  • It wasn't able to identify relationships like the one between autism and vitamin deficiencies, underscoring the continued importance of physicians' clinical experience.
  • But over half of the incorrect diagnoses (56.7%) belonged to the same organ system as the correct diagnosis, indicating more selective training of the AI is needed to get diagnostic accuracy up to snuff.
  • The study is thought to be the first to explore the accuracy of bots in entirely pediatric scenarios, which require the consideration of the patient's age alongside symptoms.

One takeaway is that physicians may need take a more active role in generating data sets for AI models to intentionally prepare them for medical functions — a process known as tuning.

Between the lines: AI models have passed medical licensing exams and been shown to outperform medical professionals in specific tasks, though doctors are still grappling with what counts as an acceptable success rate for AI-supported diagnosis.

  • Use of the technology in clinical decision-making remains controversial, with critics questioning how much AI has made a real-life difference in medical settings.
  • The study authors say the field is ripe for more study while noting large language models and bots can be useful administrative tools for tasks like writing research articles and generating patient instructions.
Go deeper