Page 90 - Read Online
P. 90

Page 8                                                                                                       Glaser et al. Art Int Surg. 2025;5:1-15  https://dx.doi.org/10.20517/ais.2024.36



                                                                                                             1.156°-6.318°       doctors (standard  requires      measurements
                                                                                                                                 reference)        separate
                                                                                                                                                   datasets for
                                                                                                                                                   each model
                                                                                                                                                   order
                          Galbusera 493 biplanar           X-ray    Fully CNN +     Not         NA           Standard error      Compared to       Limited training  50 test cases;   Fully        NA      100 epochs
                          et al.     radiographs; variety of        differentiable   explicitly              between DL          parameters        dataset size (n =  statistical analysis  convolutional
                               [31]
                          2019       spinal disorders and           spatial to      reported                 predictions & ground  extracted from   443 image     (linear regression,  network
                                     deformities                    numerical                                truth: 2.7°-11.5° for   sterEOS 3D    pairs);        Bland-Altman
                                                                    transform layer                          parameters          reconstructions   polynomial     analysis) against
                                                                                                                                 (ground truth)    interpolation   ground truth
                                                                                                                                                   introduced error


                          CNN: Convolutional neural network; NA: not applicable; SGDM: stochastic gradient descent; AP: anteroposterior; LAT: lateral; MVC-Net: multi-view correlation network; MVE-Net: multi-view extrapolation net;
                          MPF-Net: multi-task, proposal correlation, feature fusion network; MAE: mean absolute error; ICC: intraclass correlation coefficient; AI: artificial intelligence; YOLO: You Only Look Once.



                          vertebral correlation  learning schemes showed benefits for parameter accuracy through inter-relationship modeling, overcoming imaging challenges like
                                                  [25]
                          occlusion.


                          Studies assessed accuracy via comparison to expert manual measurement, using metrics such as mean absolute differences (all studies) and voxel overlap

                          measures where segmentation was evaluated            [19,23,24,31,35,36] . For Cobb angle measurement, mean errors ranged from 1.7° to 8.1°, but most CNN methods achieved
                                                                                                   [38]
                          ≤ 5° mean difference     [23-25,31,32,34,35] , adequate for clinical usage . Similar trends were held for other sagittal measurements               [19,20,23,24,31,35] . Notably, Wang et al.
                          employed extrapolation methods atop initial estimates to give the best overall accuracies of 6.2°/7.8° Cobb angle errors in lateral/AP views vs. 4.0°/4.1° for
                          MCV-Net     [20,37] . Intraclass coefficients of 0.86-0.99 [19,23-25]  confirmed automated/manual measurement agreement.



                          Comparisons were made to traditional manual measurement                  [19,20,23-25,31,35] , manual tools [19,25,27] , early machine learning applications , and different iterations of
                                                                                                                                                                                   [25]
                          automated algorithms       [19,33] . Automated methods met or exceeded both classic and contemporary techniques. Particular benefits arose in reproducibility,
                          efficiency, and standardization vs. manual approaches prone to subjectivity and variability                    [19,23,24] . Deep learning methods showed headroom over alternate
                                                                                                                                                                                                             [19]
                          automated implementations in accuracy, overcoming limitations such as occlusion. Wang et al. achieved better Cobb measurement than MCV-Net  (7.8°
                                                                                                                              [20]
                          lateral error vs. 4.1°), through vertebral correlation and extrapolation augmentations .


                          Studies cited small datasets , external validity        [19,24,31,35,36] , surgical cases [19,20,23,24,33] , implant handling [33,36] , need for inter-rater evaluations , pelvic measurement
                                                           [31]
                                                                                                                                                                                          [33]
                          gaps , follow-up studies , and real-world clinical workflow integration                [24,27]  as main limitations. Anonymization, reproducibility, negative societal impacts,
                                                        [24]
                               [27]
                          and public data availability were generally not addressed. Small samples particularly restricted subgroup analysis - only Gami et al. reported metrics by spinal
                          pathology . Building large heterogeneous benchmark datasets could facilitate model development and address generalizability. Standardized reporting
                                      [27]
                          guidelines for spine AI could also benefit the field.
   85   86   87   88   89   90   91   92   93   94   95