References
1. Sarumi OA, Heider D. Large language models and their applications in bioinformatics. Comput Struct Biotechnol J 2024;23:3498-3505.
2. Egli A. ChatGPT, GPT-4, and other large language models: the next revolution for clinical microbiology? Clin Infect Dis 2023;77:1322-1328.
3. Clusmann J, Kolbinger FR, Muti HS, et al. The future landscape of large language models in medicine. Commun Med 2023;3:141.
4. Meng X, Yan X, Zhang K, et al. The application of large language models in medicine: a scoping review. iScience 2024;27:109713.
5. Gordon W. Growing use and confidence in artificial intelligence for care delivery. NEJM Catal Innov Care Deliv 2022;3:1 CAT.22.0095 5.
6. Cascella M, Montomoli J, Bellini V, et al. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst 2023;47:33.
7. Kim TW. Application of artificial intelligence chatbots, including ChatGPT, in education, scholarly work, programming, and content generation and its prospects: a narrative review. J Educ Eval Health Prof 2023;20:38.
8. Shen Y, Heacock L, Elias J, et al. ChatGPT and other large language models are double-edged swords. Radiology 2023;307:e230163.
9. Liu CL, Ho CT, Wu TC. Custom GPTs enhancing performance and evidence compared with GPT-3.5, GPT-4, and GPT-4o? A study on the Emergency Medicine Specialist Examination. Healthcare 2024;12:1726.
10. Jo E, Song S, Kim JH, et al. Assessing GPT-4’ performance in delivering medical advice: comparative analysis with human experts. JMIR Med Educ 2024;10:e51282.
11. Bazzari AH, Bazzari FH. Assessing the ability of GPT-4o to visually recognize medications and provide patient education. Sci Rep 2024;14:26749.
12. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2023;2:e0000198.
13. Meyer A, Riese J, Streichert T. Comparison of the performance of GPT-3.5 and GPT-4 with that of medical students on the written German Medical Licensing Examination: observational study. JMIR Med Educ 2024;10:e50965.
14. Nori H, King N, McKinney SM, et al. Capabilities of GPT-4 on medical challenge problems. arXiv 2023;2:1-33.
15. Jiao C, Edupuganti NR, Patel PA, et al. Evaluating the artificial intelligence performance growth in ophthalmic knowledge. Cureus 2023;15:e45700.
16. Joly-Chevrier M, Nguyen AXL, Lesko-Krleza M, et al. Performance of ChatGPT on a practice dermatology board certification examination. J Cutan Med Surg 2023;27:407-409.
17. Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology 2023;307:e230582.
18. Kung JE, Marshall C, Gauthier C, et al. Evaluating ChatGPT performance on the Orthopaedic In-Training Examination. JB JS Open Access 2023;8:e23.00056.
19. Jain N, Gottlich C, Fisher J, et al. Assessing ChatGPT’ orthopedic in-service training exam performance and applicability in the field. J Orthop Surg Res 2024;19:27.
20. Hayes DS, Foster BK, Makar G, et al. Artificial intelligence in orthopaedics: performance of ChatGPT on text and image questions on a complete AAOS Orthopaedic In-Training Examination (OITE). J Surg Educ 2024;81:1645-1649.
21. Urman A, Makhortykh M. The silence of the LLMs: cross-lingual analysis of guardrail-related political bias and false information prevalence in ChatGPT, Google Bard (Gemini), and Bing Chat. Telemat Inform 2025;96:102211.
22. Casagrande D, Gobira M. Evaluating the accuracy of Gemini 2.0 Advanced and ChatGPT 4o in cataract knowledge: a performance analysis using Brazilian Council of Ophthalmology board exam questions. Cureus 2025;17:e79565.
23. Rakauskas TR, Da Costa A, Moriconi C, et al. Evaluation of Chat Generative Pre-trained Transformer and Microsoft Copilot performance on the American Society of Surgery of the Hand self-assessment examinations. J Hand Surg Glob Online 2025;7:23-28.
24. Xu AY, Singh M, Balmaceno-Criss M, et al. Comparitive performance of artificial intelligence-based large language models on the Orthopedic In-Training Examination. J Orthop Surg (Hong Kong) 2025;33:10225536241268789.
25. Rossettini G, Rodeghiero L, Corradi F, et al. Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study. BMC Med Educ 2024;24:694.
26. Le HV, Wick JB, Haus BM, et al. Orthopaedic In-Training Examination: history, perspective, and tips for residents. J Am Acad Orthop Surg 2021;29:e427-e437.
27. Nawari A, Zahir J, Kumar S, et al. Artificial intelligence large language models are nearly equivalent to fourth-year orthopaedic residents on the Orthopaedic In-Training Examination: a cause for concern or excitement? J Orthopaed Exp Innov 2025;6:001c.124070.
28. Chen CJ, Bilolikar VK, VanNest D, et al. Artificial intelligence in orthopaedic education: a comparative analysis of ChatGPT and Bing AI’ Orthopaedic In-Training Examination performance. Med Adv 2024;2:284-290.
29. Lubitz M, Latario L. Performance of two artificial intelligence generative language models on the Orthopaedic In-Training Examination. Orthopedics 2024;47:e146-e150.
30. Ghanem D, Covarrubias O, Raad M, et al. ChatGPT performs at the level of a third-year orthopaedic surgery resident on the Orthopaedic In-Training Examination. JB JS Open Access 2023;8:e23.00103.
31. Ozdag Y, Hayes DS, Makar GS, et al. Comparison of artificial intelligence to resident performance on upper-extremity Orthopaedic In-Training Examination questions. J Hand Surg Glob Online 2024;6:164-168.
32. Massey PA, Montgomery C, Zhang AS. Comparison of ChatGPT-3.5, ChatGPT-4, and orthopaedic resident performance on orthopaedic assessment examinations. J Am Acad Orthop Surg 2023;31:1173-1179.
33. Rizzo MG, Cai N, Constantinescu D. The performance of ChatGPT on orthopaedic in-service training exams: a comparative study of the GPT-3.5 Turbo and GPT-4 models in orthopaedic education. J Orthop 2024;50:70-75.
34. Hofmann HL, Guerra GA, Le JL, et al. The rapid development of artificial intelligence: GPT-4’ performance on orthopedic surgery board questions. Orthopedics 2024;47:e85-e89.
35. Vaishya R, Iyengar KP, Patralekh MK, et al. Effectiveness of AI-powered chatbots in responding to orthopaedic postgraduate exam questions-an observational study. Int Orthop 2024;48:1963-1969.
36. Salman IM, Ameer OZ, Khanfar MA, et al. Artificial intelligence in healthcare education: evaluating the accuracy of ChatGPT, Copilot, and Google Gemini in cardiovascular pharmacology. Front Med 2025;12:1495378.
37. Tepe M, Emekli E. Assessing the responses of large language models (ChatGPT-4, Gemini, and Microsoft Copilot) to frequently asked questions in breast imaging: a study on readability and accuracy. Cureus 2024;16:e59960.
38. Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al. Large language models in medicine. Nat Med 2023;29:1930-1940.
39. Dao T, Fu DY, Ermon S, et al. FlashAttention: fast and memory-efficient exact attention with IO-awareness. arXiv 2022; :arXiv:2205.14135.
40. Ali R, Tang OY, Connolly ID, et al. Performance of ChatGPT, GPT-4, and Google Bard on a neurosurgery oral boards preparation question bank. Neurosurgery 2023;93:1090-1098.