Page 93 - Read Online
P. 93

Page 10                           Glaser et al. Art Int Surg. 2025;5:1-15  https://dx.doi.org/10.20517/ais.2024.36

               The number of training epochs ranged from 30 to 6,000 for deep neural networks. But again, most studies
               omitted  specifics [18-31,34,36] . Two  reports  described  adaptive  epoch  counts  based  on  validation
               improvements [24,36] , rather than fixed values. Typical regimes were 30-50 epochs [31,33] . Standardized detail
               would aid reproducibility. Generalizability with shorter training requires scrutiny where transfer learning
               was not employed.

               The IJMEDI checklist for medical imaging AI highlighted several shortcomings (see tabulated results in
               requests), particularly around enabling reproducibility. Areas such as software details, computational
               resource usage, model accessibility, and evaluation set specificity suffered poor reporting. However, studies
               did well in conveying overall aims, statistical and evaluation methodology, and limitations. Recent initiatives
                                                                         [38]
               for standardizing ML reporting [39,40] , plus reproducibility checklists , may benefit new spine AI imaging
               research.

               Despite promising accuracy, certain limitations remain. Most studies used single-institutional data lacking
               sufficient diversity [19-21,23,25,28-30] . Reference standards from manual radiograph measurements intrinsically
               incorporate subjectivity from inter-observer variations . CT imaging remains unevaluated. Studies for
                                                               [41]
                                                                          [42]
               some parameters are still few. Real-world clinical validation is lacking . Our subgroup analyses found that
               studies using CNN architectures demonstrated higher accuracy for parameters like lumbar lordosis
               compared to other models. This highlights the importance of selecting appropriate architectures tailored to
               the specific radiographic quantification task. As deep learning continues advancing, further research is still
               needed to optimize model design and determine the most effective architectures for automated spinopelvic
               measurement. Larger comparative studies evaluating different network architectures on common datasets
               would help elucidate the relative merits and guide selection.


               Moving  forward,  larger  multicenter  studies  should  validate  these  models  before  clinical
               implementation [40,43] . Continued research on handling label noise and measurement uncertainty is
               required [13,41] . Standardized reporting guidelines could enhance reproducibility . Models should be
                                                                                      [40]
               optimized across diverse settings and pathologies [42,43] . Clinically meaningful accuracy metrics deserve focus
               beyond errors .
                           [41]
               The application of deep learning models and their potential role in spine surgery has already begun to be
               explored. Of value to spine surgeons, models have demonstrated success in diagnosing various
               musculoskeletal and spinal disorders, including sarcopenia, scoliosis, and low back pain [44-47] . In regard to
               prognosis, deep learning models have been successful in predicting postoperative complications such as
               surgical site infections and 30-day readmission rates after lumbar fusion procedures [48,49] . While these initial
               findings are promising, further research validating the use of these models in other realms of patient care,
               particularly surgical planning, is needed.


               Spinopelvic parameters are of great importance to the surgeon for planning, and methods of measurement
               have evolved significantly. Early assessments began with the Cobb angle and focused on spinal curvatures
               but overlooked the pelvis. In the 1980s and 1990s, the introduction of parameters such as PI, PT, and SS
               revolutionized the understanding of sagittal balance. These measurements linked pelvic alignment to spinal
               posture. By the 2000s, global spinal alignment gained attention, with the SVA and newer measures like the
               T1 pelvic angle (TPA) becoming essential for surgical planning in adult spinal deformity (ASD).


               Up until the early 2000s, measurement of spinopelvic parameters was mostly done manually and, on
               average, took 3-15 min. The manual measurement process is tedious and time-consuming while also being
   88   89   90   91   92   93   94   95   96   97   98