Page 91 - Read Online
P. 91
Page 8 Glaser et al. Art Int Surg. 2025;5:1-15 https://dx.doi.org/10.20517/ais.2024.36
1.156°-6.318° doctors (standard requires measurements
reference) separate
datasets for
each model
order
Galbusera 493 biplanar X-ray Fully CNN + Not NA Standard error Compared to Limited training 50 test cases; Fully NA 100 epochs
et al. radiographs; variety of differentiable explicitly between DL parameters dataset size (n = statistical analysis convolutional
[31]
2019 spinal disorders and spatial to reported predictions & ground extracted from 443 image (linear regression, network
deformities numerical truth: 2.7°-11.5° for sterEOS 3D pairs); Bland-Altman
transform layer parameters reconstructions polynomial analysis) against
(ground truth) interpolation ground truth
introduced error
CNN: Convolutional neural network; NA: not applicable; SGDM: stochastic gradient descent; AP: anteroposterior; LAT: lateral; MVC-Net: multi-view correlation network; MVE-Net: multi-view extrapolation net;
MPF-Net: multi-task, proposal correlation, feature fusion network; MAE: mean absolute error; ICC: intraclass correlation coefficient; AI: artificial intelligence; YOLO: You Only Look Once.
vertebral correlation learning schemes showed benefits for parameter accuracy through inter-relationship modeling, overcoming imaging challenges like
[25]
occlusion.
Studies assessed accuracy via comparison to expert manual measurement, using metrics such as mean absolute differences (all studies) and voxel overlap
measures where segmentation was evaluated [19,23,24,31,35,36] . For Cobb angle measurement, mean errors ranged from 1.7° to 8.1°, but most CNN methods achieved
[38]
≤ 5° mean difference [23-25,31,32,34,35] , adequate for clinical usage . Similar trends were held for other sagittal measurements [19,20,23,24,31,35] . Notably, Wang et al.
employed extrapolation methods atop initial estimates to give the best overall accuracies of 6.2°/7.8° Cobb angle errors in lateral/AP views vs. 4.0°/4.1° for
MCV-Net [20,37] . Intraclass coefficients of 0.86-0.99 [19,23-25] confirmed automated/manual measurement agreement.
Comparisons were made to traditional manual measurement [19,20,23-25,31,35] , manual tools [19,25,27] , early machine learning applications , and different iterations of
[25]
automated algorithms [19,33] . Automated methods met or exceeded both classic and contemporary techniques. Particular benefits arose in reproducibility,
efficiency, and standardization vs. manual approaches prone to subjectivity and variability [19,23,24] . Deep learning methods showed headroom over alternate
[19]
automated implementations in accuracy, overcoming limitations such as occlusion. Wang et al. achieved better Cobb measurement than MCV-Net (7.8°
[20]
lateral error vs. 4.1°), through vertebral correlation and extrapolation augmentations .
Studies cited small datasets , external validity [19,24,31,35,36] , surgical cases [19,20,23,24,33] , implant handling [33,36] , need for inter-rater evaluations , pelvic measurement
[31]
[33]
gaps , follow-up studies , and real-world clinical workflow integration [24,27] as main limitations. Anonymization, reproducibility, negative societal impacts,
[24]
[27]
and public data availability were generally not addressed. Small samples particularly restricted subgroup analysis - only Gami et al. reported metrics by spinal
pathology . Building large heterogeneous benchmark datasets could facilitate model development and address generalizability. Standardized reporting
[27]
guidelines for spine AI could also benefit the field.