Page 114 - Read Online
P. 114

Page 60                                                             Ambati et al. Art Int Surg. 2025;5:53-64  https://dx.doi.org/10.20517/ais.2024.45

               Table 3. Summary of AI/ML studies discussed in the postoperative prognostication subsection
                Area of investigation         Selected studies
                Perioperative complication prediction and risk   General technique: Berven et al., 2023 [38]
                                                                                 [39]
                stratification                Lumbar degenerative disease: Abdelrahman et al., 2022
                                                                        [40]         [41]
                                              Pediatric deformity: Comstock et al., 2023  ; Lim et al., 2023
                                                                 [42]           [43]
                                              Trauma: Yeretsian et al., 2022  ; Malacon et al., 2022
                                                         [44]           [45]           [46]
                Long-term outcome prognostication  Eliahu et al., 2022  ; Auloge et al., 2020  ; Burström et al., 2019  ; Elmi-Terander et al.,
                                              2020 [47] ; Charles et al., 2021 [48]
               AI: Artificial intelligence; ML: machine learning.
               consuming tasks (i.e., image segmentation of the spine or robotic navigation).

               Challenge 2: subjective outcome measures
               Despite the aforementioned challenges in developing a broad AI understanding of spine surgery arising
               from patient heterogeneity, one substantial barrier lies in challenges presented by current outcome
               measures. Many of the endpoints we follow are subjective or are influenced by a wide variety of factors that
               AI may not be able to accurately capture in an unbiased manner. For example, endpoints such as pain and
               functional status may be influenced by psychological factors. Endpoints such as the return to work may be
               influenced by socioeconomic status. Endpoints such as the need for revision surgery may be influenced by
               many factors, including preoperative comorbidities and postoperative access to care in addition to the
               surgery itself. Postoperative pain medication use is influenced by preoperative levels of tolerance and
               patterns of clinical prescription. It is critical that such models and their predictions do not lead clinicians to
               select patients or surgical approaches in a way that perpetuates present disparities. Solutions to this problem
               may be to focus on more immediate rather than long-term measures, on quantitative or radiographic
               endpoints that can be measured in a validated manner, and potentially to use AI and new technologies to
               develop novel outcome metrics that better capture the impact of spine surgery on patients’ lives.


               Challenge 3: tradeoffs in data quality and quantity
               One of the central principles of ML is that capabilities and performance increase with ever-larger
                      [15]
               datasets . In particular, cutting-edge approaches such as deep learning and large language models (the
               types of models underlying self-driving cars and ChatGPT, respectively) rely on immense amounts of data
                                                                                    [62]
               to tune hundreds of billions of parameters, from which their intelligence emerges . In spine surgery, large
               registries such as the Quality Outcomes Database (QOD), British Spine Registry, and International Spine
               Study Group (ISSG) have aggregated patient data across numerous centers, and the largest ML studies may
               incorporate thousands of patients. However, these numbers are likely sufficient for certain tasks requiring
               only simple categorical and numerical variables as inputs rather than complex data such as cross-sectional
               images, text, and video that require immense amounts of data. Still, healthcare databases often encounter
               quality issues such as missing or incomplete data and variable practices across the sites where the data were
               collected. Furthermore, as the number of variables per patient in the database increases, the difficulty of
               expanding  the  dataset  grows,  limiting  the  number  of  patients  incorporated  and  increasing  the
               administrative burden on centers that participate.


               Due to limitations in data quantity, many studies are validated using withheld patients or cross-validation
               from the same single-center datasets, which may result in model overfitting and limited clinical utility.
               Validation using independently collected external datasets will allow for improved assessments of model
               accuracy and generalizability. Even findings from multi-center studies may be affected by this problem, as
               the datasets are not completely independent of one another. In addition, some studies used large national
               datasets that may have limited granularity of clinically relevant variables, potentially limiting their models’
   109   110   111   112   113   114   115   116   117   118   119