Page 18 - Read Online
P. 18

Giakas et al. Art Int Surg 2024;4:233-46  https://dx.doi.org/10.20517/ais.2024.24                                                         Page 243

               ramifications of such programs and tools, it is crucial to evaluate ChatGPT’s utility and accuracy in
               disseminating orthopedic information. Chatbot responses may impact patients’ perceptions of treatment
               options and risks prior to an evaluation by a physician. Several studies have analyzed the utility of ChatGPT
               for patients considering orthopedic surgery [11-17] . Assessing ChatGPT’s usefulness for preoperative patient
               education in spine surgery is especially critical due to the relatively high risk of spine surgery and the
               nuances that often guide decision making regarding the indications for different operations. To our
               knowledge, the present study is the first to use a modified validated scoring system to appraise and evaluate
               ChatGPT’s responses to common patient questions when considering PLD surgery.

               Minimum scores across all ten questions would lead to a total score of 20, whereas a maximum score would
               be 100. ChatGPT’s responses in this analysis earned a score of 59, just under an average score of 3, when
               evaluated by two attending, fellowship-trained orthopedic spine surgeons. A score of 3 denoted a somewhat
               useful response of moderate quality, with some important information adequately discussed but some
               poorly discussed [Figure 1].


               In the present study, ChatGPT was generally able to provide an accurate, albeit cursory, overview of
               relevant surgical indications, techniques, complications, and alternate therapies. However, some of these
               answers, when evaluated individually, lacked the clarification necessary to provide patients with a thorough
               understanding to inform their medical decision making. Some of the answers have the potential to be
               harmful to patients, especially those answers suggesting alternative therapy without the necessary context of
               the patient’s particular history and symptom severity. In some instances, for example, PLD might be
               necessary to reverse or prevent further neurologic injury, especially for urgent and emergent indications.
               Suggesting alternative, non-operative treatment options for these patients could worsen or adversely impact
               patient outcomes. Concordantly, a prior study reported that ChatGPT had a 53% mismanagement rate,
               which would be especially deleterious for serious underlying pathology . Furthermore, non-operative
                                                                              [36]
               treatment option descriptions were often vague, such as physical therapy to “strengthen muscles”. This
               could lead some patients to pursue inadequate or harmful treatment, which may exacerbate or accelerate
               their disease processes.

               Additionally, several of the claims were not fully substantiated by current spine surgery literature and
               several of the listed indications (spondylolisthesis and degenerative disc disease) may be better treated with
               other procedures, such as spinal fusion. As noted in previous literature, ChatGPT has been trained to
               generate definitive responses to questions, even when the existing literature may not be conclusive enough
               to make a specific recommendation [37,38] . In particular, the chatbot seemed to indicate the superiority of
               MISS over the traditional open approach. While there is increasing research regarding the potential benefits
               of minimally invasive surgery, there are still gaps in the literature, which can be most appropriately
               addressed by a trained and experienced surgeon [33,34] . These discrepancies may be confusing to patients
               considering PLD and could potentially lead to a delay in care. Nevertheless, ChatGPT did repeatedly
               emphasize that its responses should be taken in conjunction with consultation with a spine surgeon. This
               inability to address appropriate, patient-specific context affirms the findings of previous literature
                                                                                          [36]
               supporting the spine surgeon’s role in providing individualized clinical recommendations .
               One limitation of any study attempting to characterize the utility of online sources of medical information
               to patients prior to a doctor’s visit is the inherent subjectivity with which the online source is evaluated. To
               combat this weakness, the present analysis implemented a more objective, validated numeric scoring
               system. Additionally, the responses were analyzed by two attending spine surgeons, both of whose scores
               were presented, providing additional insight from physicians with differing levels of experience and areas of
   13   14   15   16   17   18   19   20   21   22   23