Page 54 - Read Online

P. 54

Turlip et al. Art Int Surg 2024;4:324-30 https://dx.doi.org/10.20517/ais.2024.29 Page 328

the ability to achieve adequate clinical outcomes, indicating that increased granularity through additional
[27]
metrics is needed to overcome the disordered responses .
Despite the promising advancements of AI in spine surgery, a significant limitation in the current literature
is the lack of external validation of many studies. Most models are only internally validated on the same data
from which they were derived, raising concerns about model generalizability to larger patient populations
or different clinical settings. It was estimated that only 5% of published articles on prognostic models
included an external validation framework . Without external validation, it is difficult to ensure that these
[29]
AI models will perform reliably in diverse environments, further limiting their clinical application. This
issue is compounded by the scarcity of randomized controlled trials (RCTs) investigating AI in spine
surgery, which are essential for evaluating long-term effectiveness and accuracy.

Due to the lack of standardized reporting metrics for AI studies, it is imperative to create clear guidelines
through which the risk of bias and the potential utility of these models can be evaluated. AI studies that
focus primarily on diagnostic applications using medical imaging should adhere to the Checklist for
Artificial Intelligence in Medical Imaging (CLAIM) . The forthcoming Standards for Reporting of
[30]
Diagnostic Accuracy Studies for AI (STARD-AI), an AI-specific adaptation of the established STARD
guidelines, is also under development. Upon its release, it is expected to be indexed on the Enhancing the
QUAlity and Transparency Of health Research (EQUATOR) Network, addressing similar methodological
[31]
issues as those covered by CLAIM .

For ML multivariable prediction models, whether diagnostic or prognostic, the recently published
Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis +
Artificial Intelligence (TRIPOD + AI) provides a structured protocol for reporting predictive algorithms .
[32]
Despite the advancements since the initial 2015 TRIPOD statement, which has shown promise in improving
methodological transparency [32,33] , substantial gaps persist that hinder the broader integration of AI in
[34]
clinical practice . As AI prediction algorithms become more pervasive in spine surgery, internal and
external validation frameworks are necessary to appraise model performance, ensuring the variability in
different patient populations is reflected to enhance surgical precision.

CONCLUSION
The integration of AI and ML into spine surgery represents a transformative shift toward precision
medicine, offering enhanced diagnostic and prognostic capabilities. With the advances in automated
radiographic imaging, patient risk stratification, outcomes prediction, and personalized medicine, future
work promises to tailor treatment to individual patients more accurately. Despite the promising
achievements so far, the field must address challenges in data accuracy by expanding training datasets and
implementing robust validation frameworks. As AI becomes more prevalent in spine surgery, successful
integration has the power to refine surgical decision making and improve patient outcomes.

DECLARATIONS
Authors’ contributions
Original draft preparation, methodology, conceptualization: Turlip RW
Original draft presentation, conceptualization: Khela HS
Review and editing, supervision: Dagli MM, Ghenbot Y, Ahmad HS
Review and editing, validation: Chauhan D
Review and editing, supervision, conceptualization: Yoon JW

49 50 51 52 53 54 55 56 57 58 59