News|Articles|April 4, 2026

AI tools enhance pediatric diagnostic accuracy

Author(s)Celeste Krewson, Assistant Editor

Fact checked by: Kelly King

Key Takeaways

Advanced LLMs outperformed human clinicians in diagnosing real-world pediatric cases, particularly when identifying rare diseases.
The highest diagnostic accuracy (94.3%) was achieved when AI tools were provided with extended clinical information and used in conjunction with human expertise.
Researchers emphasize that AI shouldn't replace clinicians, but rather act as a supportive tool integrated into a continuous, clinician-guided process with robust oversight.

Research shows that combining AI models with human expertise significantly boosts diagnostic performance, particularly for rare diseases.

Investigators from Hospital Sant Joan de Déu have found improved pediatric diagnosis when artificial intelligence (AI) and human clinicians work together, publishing their findings in Pediatric Investigation.¹

Challenges of diagnosing diseases in pediatric patients include rare conditions with subtle or overlapping symptoms, leading to uncertainty that may delay diagnosis and cause adverse outcomes. AI models have been considered for improving health care, but most studies have been based on simplified or curated cases.¹

“Rather than replacing clinicians, these tools may help broaden the differential diagnosis and reduce the likelihood of missed diagnoses—as long as outputs are interpreted critically and within robust oversight frameworks,” said Cristian Launes, MD, MSc, PhD, clinical professor at Hospital Sant Joan de Déu.¹

LLMs vs clinicians

The cross-sectional study was conducted to obtain real-world data about the diagnostic accuracy, consistency, and clinical usability of large language models (LLMs) as diagnostic support tools in pediatric medicine.² There were 4 LLMs included in the study, as follows:

DxGPT/GPT-4 (0613)
Claude-3.5 Sonnet
GPT-4o (0513)
o1-preview

Knowledge cutoffs for these models were September 2021, early 2024, October 2023, and October 2023, respectively. The 4 models were selected based on application programming interface accessibility, benchmark performance, latency, pricing, and architectural diversity.²

These models were compared to 78 pediatric clinicians by assessing 50 real-world cases, 25 of which involved rare diseases and 25 involved common conditions. The clinicians had different levels of experience, and each case was run through a single LLM 3 times.²

Data extracted for each case included patient sex and age, main symptoms and signs, relevant medical history, physical examination findings, and results from the initial complementary tests. Through discussion, investigators classified scenarios as low, medium, or high complexity.²

The top 1 and top 5 diagnostic accuracy, response consistency, and qualitative evaluation were used to evaluate performance. Diagnostic efficacy was achieved by providing extended clinical information for 20 cases, leading to 70 unique clinical scenarios.²

Accuracy findings

Significant improvements in diagnostic accuracy were reported from advanced LLMs vs clinicians, with top 1 accuracies of 60%, 59%, and 48.2% for o1-preview, Claude-3.5 Sonnet, and clinicians, respectively. This indicated ORs of 2.99 for o1-preview and 2.75 for Claude-3.5 vs clinicians.²

Top 5 accuracies were also significant for LLMs. The greatest rates were 78.1% for o1-preview and 77.6% for Claude-3.5. Mid-tier performance was reported for GPT-4o, while the lowest performance was found in the DxGPT model.²

The most significant improvement in LLMs was reported for rare diseases. In this assessment, a 6-fold improvement was reported for o1-preview in top 5 diagnostic odds vs clinicians, with an OR of 6.²

This model also had the greatest rate among LLMs for top 1 accuracy of 50%. In comparison, the greatest Top-1 accuracy for common disease of 77.1% was reported in Claude-3.5.²

Synergy between AI and clinical insight

Both LLMs and clinicians reported improved accuracy from extended clinical information, including o1-preview with a 94.3% union accuracy. This rate was 10% greater than clinician accuracy alone. Additionally, favorable clinician ratings were reported for DxGPT, with a mean score of 3.9 out of 5 overall and 4.1 out of 5 for rare case support.²

Overall, the results indicated improved performance from LLMs vs prior models and human clinicians. However, investigators stressed the need for addressing variability, establishing regulatory frameworks, and maintaining human oversight in order to implement these models.²

“AI systems perform best when they are part of a continuous clinical process, where clinicians iteratively gather, verify, and curate the evolving clinical picture to feed the model,” said Launes.¹

References

Pediatric Investigation study finds AI and clinicians together improve pediatric diagnosis. News release. Pediatric Investigation. March 31, 2026. Accessed April 3, 2026. https://www.eurekalert.org/news-releases/1122121
Launes C, Esteller-Cucala, Alvarez-Estape M, et al. Large-language-models for pediatric diagnosis: Performance evaluation using real-world clinical notes from common and rare cases. Pediatr Investig. Published online March 25, 2026. doi:10.1002/ped4.70053

Access practical, evidence-based guidance to support better care for our youngest patients. Join our email list for the latest clinical updates.

Latest CME

Video

Progress in Hyperlipidemia Management to Reduce ASCVD Risk: An Illustrated Update

Nihar R. Desai, MD, MPH; Martha Gulati, MD, MS, FACC, FAHA, MASPC, FESC, FSCCT (hon), FRCP Edin

AI tools enhance pediatric diagnostic accuracy

Key Takeaways

LLMs vs clinicians

Accuracy findings

Synergy between AI and clinical insight

Related Content

Joyce Woo, MD, on how hospital factors may influence cardiac transfer timing for newborns

Pertussis resurgence: An expert Q&A on diagnostic delays and point-of-care testing

Udenafil study shows lower ELF scores in Fontan-associated liver disease

Later transfer linked to longer hospitalization in severe congenital heart defects

FDA accepts cefiderocol sNDA for serious gram-negative infections in children

Latest CME

Progress in Hyperlipidemia Management to Reduce ASCVD Risk: An Illustrated Update

Trending on Contemporary Pediatrics

Cyclosporiasis in 2026: A clinical update for pediatricians

Nonreceipt of newborn intramuscular vitamin K linked to doubled bleeding risk

Udenafil study shows lower ELF scores in Fontan-associated liver disease

Supporting the language development of deaf and hard of hearing children: A clinical guide for pediatric providers

FDA accepts cefiderocol sNDA for serious gram-negative infections in children