The Southern Medical Journal (SMJ) is the official, peer-reviewed journal of the Southern Medical Association. It has a multidisciplinary and inter-professional focus that covers a broad range of topics relevant to physicians and other healthcare specialists.
SMJ // Article
Original Article
Assessing the Accuracy and Reliability of ChatGPT-4 to Answer Clinical EHR Messages in Sports Medicine
Abstract
Objectives: Although advancements in electronic health records (EHRs) have improved clinical productivity, digital administrative responsibilities have led to increased physician burnout. With the emergence of large language models (LLMs), their incorporation into medicine is a potential solution to the increase in tasks such as charting and responding to patient messages. Previous studies have evaluated the efficacy of LLMs such as Chat Generative Pre-Trained Transformer-4 (ChatGPT-4) in clinical knowledge-based questions. Few studies, however, have evaluated the responses to clinical decision making in sports medicine. This study aims to evaluate the efficiency and clinical accuracy of ChatGPT-4 responses to common sports medicine questions that patients ask in the EHR system.Methods: ChatGPT-4 was prompted with few-shot exemplars involving different sports medicine injuries to generate 80 EHR scenarios. Next, ChatGPT-4 was programmed to respond to the 80 EHR scenarios using the created programmed approaches to generate LLM drafts. In stage 1, four board-certified orthopedic surgeons were asked to respond to the EHR responses, followed by a survey evaluating the difficulty and urgency of the situation. In stage 2, they were asked to edit the LLM drafts so that they were clinically acceptable to send to a patient.
Results: In stage 1, the assessing physicians found responding to the LLM clinical question to be trivial in 60 out of 80 cases (75%). Most physicians disagreed that the patients in the LLM drafts were experiencing a severe medical event in 58 out of 80 cases (72.50%). In stage 2, the physicians rated the LLM-assisted responses as acceptable without modifications in 58 out of 80 cases (72.50%). Furthermore, the physicians agreed that the unedited LLM-assisted responses had a low chance of causing harm in 75 out of 80 cases (93.75%). Finally, the physicians rated the responses as generated by artificial intelligence in 65 out of 80 cases (81.25%).
Conclusions: Surgeons rated the majority of the LLM responses as both clinically accurate and time-saving, with a low risk of causing harm. This finding suggests that LLMs have the potential to provide adequate responses to EHR messages within the field of sports medicine, potentially lessening physician burden and workload.
This content is limited to qualifying members.
Existing members, please login first
If you have an existing account please login now to access this article or view purchase options.
Purchase only this article ($25)
Create a free account, then purchase this article to download or access it online for 24 hours.
Purchase an SMJ online subscription ($75)
Create a free account, then purchase a subscription to get complete access to all articles for a full year.
Purchase a membership plan (fees vary)
Premium members can access all articles plus recieve many more benefits. View all membership plans and benefit packages.
