Page 85 - Read Online

P. 85

Glaser et al. Art Int Surg. 2025;5:1-15 https://dx.doi.org/10.20517/ais.2024.36 Page 5

lacking, with most datasets from single institutions [Table 1].

A range of deep learning models were applied for automated spinal measurement, including custom
CNNs [18-21,23-25,27-31] , multi-view correlation networks [19,20] , and segmentation-based approaches [23,24,28-30] . For
Cobb angle measurement, mean absolute errors ranged from 1.2° to 7.81° [18-21,23,27,28] , with most studies
achieving errors ≤ 5°. Similar trends were observed for other sagittal parameters, such as thoracic kyphosis,
lumbar lordosis, and PI [18,19,22-25,29,30] . Intraclass correlation coefficients between automated and manual
measurements exceeded 0.75, indicating strong agreement [22,25,26] . Computational efficiency was reported in
several studies, with inference times ranging from 0.2 to 75 s per image [22,23,27,28] , demonstrating the potential
for accelerated analysis compared to manual methods.

Cobb angle demonstrated a pooled mean error of 4.3° (95%CI: 3.2°-5.4°). Thoracic kyphosis and lumbar
lordosis showed similar pooled errors of 3.9° (95%CI: 2.7°-5.1°) and 3.6° (95%CI: 2.8°-4.4°), respectively. PT
had the lowest pooled error at 1.9° (95%CI: 1.3°-2.5°), while PI exhibited a slightly higher pooled error of
4.1° (95%CI: 2.7°-5.5°). Sagittal vertical axis (SVA) demonstrated a pooled mean error of 1.3 cm (95%CI:
0.9-1.7 cm). These results highlight the overall accuracy of deep learning models in automatically measuring
key spinopelvic parameters from radiographic images [Figure 2].

Manual measurement of spinopelvic parameters has shown inter-observer variability ranging from 5° to 10°
for Cobb angle measurements and similar ranges for other parameters in previous studies. The pooled AI
measurement errors we found (4.3° for Cobb angle, 3.9° for thoracic kyphosis, and 3.6° for lumbar lordosis)
demonstrate comparable or better accuracy than manual measurements while offering significantly
improved efficiency.

Quality assessment
Utilizing the IJEMDI checklist, the papers address most checklist items sufficiently but have room for
improvement around enabling replicability and providing more method/software specifics. Conflicts of
interest and limitations also remain inconsistently addressed [Supplementary Table 2].

DISCUSSION
This systematic review and meta-analysis demonstrate the potential of deep learning for the automated
measurement of spinopelvic parameters from radiographs. The comprehensive literature search identified
14 eligible studies between 2018-2023, analyzing over 10,000 radiographs with deep CNNs and other
architectures [18-31] . The studies utilized various imaging sources to develop and validate automated methods
for measuring spinal alignment, including lateral X-rays [19,20,23-25,27,32-34] , biplanar radiographs [31,35] , and CT
scans . Both preoperative [19,23,25,27,31-33,35,36] and postoperative [19,20,24] images were employed, with 6 studies
[36]
incorporating cases with spinal implants [19,20,23,24,33,36] to evaluate performance in surgically altered anatomy.
The diversity of imaging captures numerous clinically relevant scenarios. However, multicenter external
validation was lacking, with most datasets from single institutions. Aspects like vendor variability could
impact segmentation. Model development must be capable of analyzing all imaging protocols for
translation.

A range of model types were applied for automated spinal measurement, from conventional machine
learning [21,25] and rule-based systems to modern deep CNNs [19,20,23-25,31,33-37] . Details for replication varied
[31]
extensively - 4 studies provided no specific model details [19,25,27,34] , while 5 gave networks and
parameters [19,20,25,31,35] . Public code/data availability remains limited. Custom architectures were common for
direct spinal measurement [19,33-35,37] , rather than off-the-shelf models. Multi-task [25,33,37] , multi-view [19,33] , and

80 81 82 83 84 85 86 87 88 89 90