The Southern Medical Journal (SMJ) is the official, peer-reviewed journal of the Southern Medical Association. It has a multidisciplinary and inter-professional focus that covers a broad range of topics relevant to physicians and other healthcare specialists.

Original Article

Assessing the Accuracy and Reliability of ChatGPT-4 to Answer Clinical EHR Messages in Sports Medicine

Authors: Fahad Nadeem, BS, Caleb Berta, BS, Dev Dayal, BS, Clay Rahaman, BA, Alexa Smitherman, MS, Maxwell Harrell, BS, Elizabeth Powell, BS, Bradford Minor, MS, Thomas Evely, DO, Eugene Brabston, MD, Amit Momaya, MD

Abstract

Objectives: Although advancements in electronic health records (EHRs) have improved clinical productivity, digital administrative responsibilities have led to increased physician burnout. With the emergence of large language models (LLMs), their incorporation into medicine is a potential solution to the increase in tasks such as charting and responding to patient messages. Previous studies have evaluated the efficacy of LLMs such as Chat Generative Pre-Trained Transformer-4 (ChatGPT-4) in clinical knowledge-based questions. Few studies, however, have evaluated the responses to clinical decision making in sports medicine. This study aims to evaluate the efficiency and clinical accuracy of ChatGPT-4 responses to common sports medicine questions that patients ask in the EHR system.

Methods: ChatGPT-4 was prompted with few-shot exemplars involving different sports medicine injuries to generate 80 EHR scenarios. Next, ChatGPT-4 was programmed to respond to the 80 EHR scenarios using the created programmed approaches to generate LLM drafts. In stage 1, four board-certified orthopedic surgeons were asked to respond to the EHR responses, followed by a survey evaluating the difficulty and urgency of the situation. In stage 2, they were asked to edit the LLM drafts so that they were clinically acceptable to send to a patient.

Results: In stage 1, the assessing physicians found responding to the LLM clinical question to be trivial in 60 out of 80 cases (75%). Most physicians disagreed that the patients in the LLM drafts were experiencing a severe medical event in 58 out of 80 cases (72.50%). In stage 2, the physicians rated the LLM-assisted responses as acceptable without modifications in 58 out of 80 cases (72.50%). Furthermore, the physicians agreed that the unedited LLM-assisted responses had a low chance of causing harm in 75 out of 80 cases (93.75%). Finally, the physicians rated the responses as generated by artificial intelligence in 65 out of 80 cases (81.25%).

Conclusions: Surgeons rated the majority of the LLM responses as both clinically accurate and time-saving, with a low risk of causing harm. This finding suggests that LLMs have the potential to provide adequate responses to EHR messages within the field of sports medicine, potentially lessening physician burden and workload.

This content is limited to qualifying members.

Existing members, please login first

If you have an existing account please login now to access this article or view purchase options.

Purchase only this article ($25)

Create a free account, then purchase this article to download or access it online for 24 hours.

Purchase an SMJ online subscription ($75)

Create a free account, then purchase a subscription to get complete access to all articles for a full year.

Purchase a membership plan (fees vary)

Premium members can access all articles plus recieve many more benefits. View all membership plans and benefit packages.

References

1. Janssen A, Kay J, Talic S, et al. Electronic Health Records That Support Health Professional Reflective Practice: a Missed Opportunity in Digital Health. J Healthc Inform Res 2022;6:375-384.

2. Upadhyay S, Hu H-F. A Qualitative Analysis of the Impact of Electronic Health Records (EHR) on Healthcare Quality and Safety: Clinicians’ Lived Experiences. Health Serv Insights 2022;15:11786329211070722.

3. Baughman DJ. Technology in Medicine: Optimizing Electronic Health Records. FP Essent 2024;537:7-13.

4. Herd P, Moynihan D. Health care administrative burdens: Centering patient experiences. Health Serv Res 2021;56:751-754.

5. Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al. Large language models in medicine. Nat Med 2023;29:1930-1940.

6. Epic. Epic and Microsoft Bring GPT-4 to EHRs. https://www.epic.com/epic/post/epic-and-microsoft-bring-gpt-4-to-ehrs. Published May 5, 2023. Accessed March 20, 2026.

7. Chen S, Guevara M, Moningi S, et al. The effect of using a large language model to respond to patient messages. Lancet Digit Health 2024;6:e379-e381.

8. Chen S, Kann BH, Foote MB, et al. Use of Artificial Intelligence Chatbots for Cancer Treatment Information. JAMA Oncol 2023;9:1459.

9. Singhal K, Azizi S, Tu T, et al. Large language models encode clinical knowledge. Nature 2023;620:172-180.

10. Liu S, McCoy AB, Wright AP, et al. Leveraging Large Language Models for Generating Responses to Patient Messages. J Am Med Inform Assoc 2024;31:1367-1379.

11. Ge Y, Guo Y, Das S, et al. Few-shot learning for medical text: A review of advances, trends, and opportunities. J Biomed Inform 2023;144:104458.

12. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-174.

13. Maleki Varnosfaderani S, Forouzanfar M. The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century. Bioengineering 2024;11:337.

14. Alowais SA, Alghamdi SS, Alsuhebany N, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ 2023;23:689.

15. Jung H, , Kim Y, , Choi H, , et al. Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients. 2024 doi: 10.48550˺.2404.05144

16. Sujan M, Furniss D, Grundy K, et al. Human factors challenges for the safe use of artificial intelligence in patient care. BMJ Health Care Inform 2019;26:e100081.

17. Johns WL, Kellish A, Farronato D, et al. ChatGPT Can Offer Satisfactory Responses to Common Patient Questions Regarding Elbow Ulnar Collateral Ligament Reconstruction. Arthrosc Sports Med Rehabil 2024;6:100893.

18. White CA, Masturov YA, Haunschild E, et al. Can ChatGPT Reliably Answer the Most Common Patient Questions Regarding Total Shoulder Arthroplasty? J Shoulder Elbow Surg 2025;34:e254-e264.

Original Article

Assessing the Accuracy and Reliability of ChatGPT-4 to Answer Clinical EHR Messages in Sports Medicine

Abstract

This content is limited to qualifying members.

Existing members, please login first

Purchase only this article ($25)

Purchase an SMJ online subscription ($75)

Purchase a membership plan (fees vary)

References

Issue

Article

Tools

SMJ // Article

Original Article

Assessing the Accuracy and Reliability of ChatGPT-4 to Answer Clinical EHR Messages in Sports Medicine

Abstract

This content is limited to qualifying members.

Existing members, please login first

Purchase only this article ($25)

Purchase an SMJ online subscription ($75)

Purchase a membership plan (fees vary)

References

Share

Issue

Article

Tools

The Southern Medical Association is a Non Profit Organization.

Your support is critical to our success.