Page 32 - Read Online
P. 32
Page 268 Yoseph et al. Art Int Surg 2024;4:267-77 https://dx.doi.org/10.20517/ais.2024.38
given to LLM vs. physician-generated responses. Compared with age-matched controls, cervical spine surgery
patients were more likely to rate physician-generated responses as higher in clarity (H = 6.42, P = 0.011) and
completeness (H = 7.65, P = 0.006).
Conclusion: Despite a small sample size, our findings indicate that LLMs offer comparable, and occasionally
preferred, information in terms of clarity and comprehensiveness of responses to common ACDF questions. It is
particularly striking that ratings were similar, considering LLM-generated responses were, on average, 80% shorter
than physician responses. Further studies are needed to determine how LLMs can be integrated into spine surgery
education in the future.
Keywords: Anterior cervical discectomy and fusion (ACDF), large language model (LLM), ChatGPT, Gemini,
patient education, health information quality, patient perspectives
INTRODUCTION
Anterior cervical discectomy and fusion (ACDF) is a common surgical intervention for the management of
cervical spinal pathologies, including degenerative disc disease (central and paracentral disc herniations,
and cervical stenosis), traumatic injuries, infection, and tumors . The procedure’s technical aspects have
[1]
[2]
undergone significant evolution, enhancing surgical outcomes and patient recovery trajectories . Despite
the procedure’s high prevalence, the complexity of ACDF, the heterogeneous pathologies for which it is
performed, and the varying surgical techniques pose challenges for patients attempting to understand the
[3]
surgery’s risks, benefits, and postoperative recovery process .
Studies have shown that a significant proportion of patients rely on online resources to gather information
about surgeries, and that facilitating access to online health information can bolster patient compliance,
postoperative plan adherence, and support the patient-physician relationship. However, this reliance on
digital health resources can prove problematic, as outdated, contradictory, or highly technical information
can complicate the patient’s decision making . In this context, Langford et al. emphasized the importance
[4]
of integrating high-quality online information into medical consultations, significantly impacting patient
care and the dialogue between patients and physicians . Thus, patient-focused online educational resources
[5]
must be precise, accessible, and importantly, accurate.
Studies have reported on the capability of large language models (LLMs), such as OpenAI’s ChatGPT and
Google’s Gemini (formerly known as Bard), to parse through vast datasets and online surgical information
to generate patient-specific responses that are coherent, comprehensive, and concise . Nonetheless, the
[6-8]
accuracy, clarity, and completeness with which LLMs navigate complex medical domains, interpret clinical
nuances, and subsequently deliver patient-friendly explanations warrants validation and continuous
refinement. While LLMs have the potential to enhance patient comprehension of their medical conditions
and treatment options, thereby increasing transparency and trust in surgical decision making, they may also
carry the risk of disseminating inaccurate or biased information that could mislead patients and adversely
affect their decision making and health outcomes . In this study, we evaluate the clarity and
[9]
comprehensiveness of ChatGPT, Gemini, and two spine surgeons’ responses to ten frequently asked patient
questions by comparing how cervical spine surgery patients and their age-matched non-surgical patient
counterparts rated these answers in terms of clarity and completeness.
METHODS
This cross-sectional study was approved by the Stanford Institutional Review Board (IRB-eProtocol
#73097), and informed consent was obtained from all study participants.

