Page 32 - Read Online
P. 32

Page 268                                                         Yoseph et al. Art Int Surg 2024;4:267-77  https://dx.doi.org/10.20517/ais.2024.38

               given to LLM vs. physician-generated responses. Compared with age-matched controls, cervical spine surgery
               patients were more likely to rate physician-generated responses as higher in clarity (H = 6.42, P = 0.011) and
               completeness (H = 7.65, P = 0.006).

               Conclusion: Despite a small sample size, our findings indicate that LLMs offer comparable, and occasionally
               preferred, information in terms of clarity and comprehensiveness of responses to common ACDF questions. It is
               particularly striking that ratings were similar, considering LLM-generated responses were, on average, 80% shorter
               than physician responses. Further studies are needed to determine how LLMs can be integrated into spine surgery
               education in the future.

               Keywords: Anterior cervical discectomy and fusion (ACDF), large language model (LLM), ChatGPT, Gemini,
               patient education, health information quality, patient perspectives




               INTRODUCTION
               Anterior cervical discectomy and fusion (ACDF) is a common surgical intervention for the management of
               cervical spinal pathologies, including degenerative disc disease (central and paracentral disc herniations,
               and cervical stenosis), traumatic injuries, infection, and tumors . The procedure’s technical aspects have
                                                                      [1]
                                                                                                 [2]
               undergone significant evolution, enhancing surgical outcomes and patient recovery trajectories . Despite
               the procedure’s high prevalence, the complexity of ACDF, the heterogeneous pathologies for which it is
               performed, and the varying surgical techniques pose challenges for patients attempting to understand the
                                                                 [3]
               surgery’s risks, benefits, and postoperative recovery process .
               Studies have shown that a significant proportion of patients rely on online resources to gather information
               about surgeries, and that facilitating access to online health information can bolster patient compliance,
               postoperative plan adherence, and support the patient-physician relationship. However, this reliance on
               digital health resources can prove problematic, as outdated, contradictory, or highly technical information
               can complicate the patient’s decision making . In this context, Langford et al. emphasized the importance
                                                     [4]
               of integrating high-quality online information into medical consultations, significantly impacting patient
               care and the dialogue between patients and physicians . Thus, patient-focused online educational resources
                                                             [5]
               must be precise, accessible, and importantly, accurate.

               Studies have reported on the capability of large language models (LLMs), such as OpenAI’s ChatGPT and
               Google’s Gemini (formerly known as Bard), to parse through vast datasets and online surgical information
               to generate patient-specific responses that are coherent, comprehensive, and concise . Nonetheless, the
                                                                                        [6-8]
               accuracy, clarity, and completeness with which LLMs navigate complex medical domains, interpret clinical
               nuances, and subsequently deliver patient-friendly explanations warrants validation and continuous
               refinement. While LLMs have the potential to enhance patient comprehension of their medical conditions
               and treatment options, thereby increasing transparency and trust in surgical decision making, they may also
               carry the risk of disseminating inaccurate or biased information that could mislead patients and adversely
               affect  their  decision  making  and  health  outcomes . In  this  study,  we  evaluate  the  clarity  and
                                                               [9]
               comprehensiveness of ChatGPT, Gemini, and two spine surgeons’ responses to ten frequently asked patient
               questions by comparing how cervical spine surgery patients and their age-matched non-surgical patient
               counterparts rated these answers in terms of clarity and completeness.


               METHODS
               This cross-sectional study was approved by the Stanford Institutional Review Board (IRB-eProtocol
               #73097), and informed consent was obtained from all study participants.
   27   28   29   30   31   32   33   34   35   36   37