Original Article

Performance of Large Language Models on Diagnostic Radiology Board–Style Questions: A Comparative Evaluation of GPT-4o, Perplexity AI, and OpenEvidence

Objective: The objective of this study was to compare the diagnostic accuracy and internal consistency of GPT-4o (Generative Pre-Trained Transformer-4 omni), Perplexity AI (artificial intelligence), and OpenEvidence when applied to text-based, specialty-level radiology board questions. Methods: A total of 161 text-based multiple-choice questions from the American College of Radiology (ACR)…

Posted in: artificial intelligence 4 diagnostic accuracy 2 radiology 3