Original Article

Performance of Chat Generative Pre-Trained Transformer on Personal Review of Learning in Obstetrics and Gynecology

Authors: Adam Cohen, MD, Jersey Burns, MD, Martina Gabra, MD, Alex Gordon, DO, Nicholas Deebel, MD, Ryan Terlecki, MD, Katherine L. Woodburn, MD

Abstract

Objectives: Chat Generative Pre-Trained Transformer (ChatGPT) is a popular natural-language processor that is able to analyze and respond to a variety of prompts, providing eloquent answers based on a collection of Internet data. ChatGPT has been considered an avenue for the education of resident physicians in the form of board preparation in the contemporary literature, where it has been applied against board study material across multiple medical specialties. The purpose of our study was to evaluate the performance of ChatGPT on the Personal Review of Learning in Obstetrics and Gynecology (PROLOG) assessments and gauge its specialty specific knowledge for educational applications.

Methods: PROLOG assessments were administered to ChatGPT version 3.5, and the percentage of correct responses was recorded. Questions were categorized by question stem order and used to measure ChatGPT performance. Performance was compared using descriptive statistics.

Results: There were 848 questions without visual components; ChatGPT answered 57.8% correct (N = 490). ChatGPT performed worse on higher-order questions compared with first-order questions, 56.8% vs 60.5%, respectively. There were 65 questions containing visual data, and ChatGPT answered 16.9% correctly.

Conclusions: The passing score for the PROLOG assessments is 80%; therefore ChatGPT 3.5 did not perform satisfactorily. Given this, it is unlikely that the tested version of ChatGPT has sufficient specialty-specific knowledge or logical capability to serve as a reliable tool for trainee education.
Posted in: Obstetrics and Gynecology83

This content is limited to qualifying members.

Existing members, please login first

If you have an existing account please login now to access this article or view purchase options.

Purchase only this article ($25)

Create a free account, then purchase this article to download or access it online for 24 hours.

Purchase an SMJ online subscription ($75)

Create a free account, then purchase a subscription to get complete access to all articles for a full year.

Purchase a membership plan (fees vary)

Premium members can access all articles plus recieve many more benefits. View all membership plans and benefit packages.

References

1. KevinS.MicrosoftteamsupwithOpenAItoexclusivelylicenseGPT-3languagemodel. https://blogs.microsoft.com/blog/2020/09/22/microsoftteams-up-with-openai-to-exclusivelylicense-gpt-3-languagemodel. Published September 22 ,2020. Accessed April27, 2023.
 
2. Levin G, Brezinov Y, Meyer R. Exploring the use of ChatGPT in OBGYN: abibliometric analysis of the first ChatGPT-related publications. Arch Gynecol Obstet 2023;308:1785-1789.
 
3. Altmäe S, Sola-Leyva A, Salumets A. Artificial intelligence in scientific writing: a friend or a foe? Reprod Biomed Online2023;47:3-9.
 
4. Arun Babu T, Sharmila V. Using artificial intelligence chatbots like “ChatGPT” to draft articles for medical journals—advantages, limitations, ethical concerns and way forward. Eur J Obstet Gynecol Reprod Biol 2023;286:151.
 
5. Sharma S, Pajai S, Prasad R, et al. A critical review of ChatGPTas a potential substitute for diabetes educators. Cureus 2023;15:e38380.
 
6. Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States Medical Licensing Examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 2023;9:e45312.
 
7. Grünebaum A, Chervenak J, Pollet SL, et al. The exciting potential for ChatGPT in obstetrics and gynecology. Am J Obstet Gynecol 2023;228:696-705.
 
8. Sanchez-Ramos L, Lin L, Romero R. Beware of references when using ChatGPTas a source of information to write scientific articles. Am J Obstet Gynecol 2023;229:356-357.
 
9. Li S, Kemp M, Logan S, et al. ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology. Am J Obstet Gynecol 2023;229: 172.e1-172.e12.
 
10. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2023;2: e0000198.
 
11. Angel MC, Rinehart JB, Canneson MP,et al.Clinical knowledge and reasoning abilities of AI large language models in anesthesiology: a comparative study on the ABA Exam. medRxiv 2023 May 16:2023.05.10.23289805.
 
12. Lin JC, Younessi DN, Kurapati SS, et al. Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination. Eye (Lond) 2023;37: 3694-3695.
 
13. Ali R, Tang OY, Connolly ID, et al. Performance of ChatGPT, GPT-4, and Google Bard on a neurosurgery oral boards preparation question bank. Neurosurgery 2023;93:1090-1098.
 
14. Alkaissi H, McFarlane SI. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus 2023;15:e35179.
 
15. Lee P, Bubeck S, PetroJ. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med 2023;388:1233-1239.