Page 85 - Read Online
P. 85

Glaser et al. Art Int Surg. 2025;5:1-15  https://dx.doi.org/10.20517/ais.2024.36      Page 5

               lacking, with most datasets from single institutions [Table 1].


               A range of deep learning models were applied for automated spinal measurement, including custom
               CNNs [18-21,23-25,27-31] , multi-view correlation networks [19,20] , and segmentation-based approaches [23,24,28-30] . For
               Cobb angle measurement, mean absolute errors ranged from 1.2° to 7.81° [18-21,23,27,28] , with most studies
               achieving errors ≤ 5°. Similar trends were observed for other sagittal parameters, such as thoracic kyphosis,
               lumbar lordosis, and PI [18,19,22-25,29,30] . Intraclass correlation coefficients between automated and manual
               measurements exceeded 0.75, indicating strong agreement [22,25,26] . Computational efficiency was reported in
               several studies, with inference times ranging from 0.2 to 75 s per image [22,23,27,28] , demonstrating the potential
               for accelerated analysis compared to manual methods.


               Cobb angle demonstrated a pooled mean error of 4.3° (95%CI: 3.2°-5.4°). Thoracic kyphosis and lumbar
               lordosis showed similar pooled errors of 3.9° (95%CI: 2.7°-5.1°) and 3.6° (95%CI: 2.8°-4.4°), respectively. PT
               had the lowest pooled error at 1.9° (95%CI: 1.3°-2.5°), while PI exhibited a slightly higher pooled error of
               4.1° (95%CI: 2.7°-5.5°). Sagittal vertical axis (SVA) demonstrated a pooled mean error of 1.3 cm (95%CI:
               0.9-1.7 cm). These results highlight the overall accuracy of deep learning models in automatically measuring
               key spinopelvic parameters from radiographic images [Figure 2].


               Manual measurement of spinopelvic parameters has shown inter-observer variability ranging from 5° to 10°
               for Cobb angle measurements and similar ranges for other parameters in previous studies. The pooled AI
               measurement errors we found (4.3° for Cobb angle, 3.9° for thoracic kyphosis, and 3.6° for lumbar lordosis)
               demonstrate comparable or better accuracy than manual measurements while offering significantly
               improved efficiency.

               Quality assessment
               Utilizing the IJEMDI checklist, the papers address most checklist items sufficiently but have room for
               improvement around enabling replicability and providing more method/software specifics. Conflicts of
               interest and limitations also remain inconsistently addressed [Supplementary Table 2].


               DISCUSSION
               This systematic review and meta-analysis demonstrate the potential of deep learning for the automated
               measurement of spinopelvic parameters from radiographs. The comprehensive literature search identified
               14 eligible studies between 2018-2023, analyzing over 10,000 radiographs with deep CNNs and other
               architectures [18-31] . The studies utilized various imaging sources to develop and validate automated methods
               for measuring spinal alignment, including lateral X-rays [19,20,23-25,27,32-34] , biplanar radiographs [31,35] , and CT
               scans . Both preoperative [19,23,25,27,31-33,35,36]  and postoperative [19,20,24]  images were employed, with 6 studies
                    [36]
               incorporating cases with spinal implants [19,20,23,24,33,36]  to evaluate performance in surgically altered anatomy.
               The diversity of imaging captures numerous clinically relevant scenarios. However, multicenter external
               validation was lacking, with most datasets from single institutions. Aspects like vendor variability could
               impact segmentation. Model development must be capable of analyzing all imaging protocols for
               translation.


               A range of model types were applied for automated spinal measurement, from conventional machine
               learning [21,25]  and rule-based systems  to modern deep CNNs [19,20,23-25,31,33-37] . Details for replication varied
                                              [31]
               extensively  -  4  studies  provided  no  specific  model  details [19,25,27,34] , while  5  gave  networks  and
               parameters [19,20,25,31,35] . Public code/data availability remains limited. Custom architectures were common for
               direct spinal measurement [19,33-35,37] , rather than off-the-shelf models. Multi-task [25,33,37] , multi-view [19,33] , and
   80   81   82   83   84   85   86   87   88   89   90