Page 94 - Read Online

P. 94

Glaser et al. Art Int Surg. 2025;5:1-15 https://dx.doi.org/10.20517/ais.2024.36 Page 11

[50]
prone to rater-dependent error . Advancements in imaging techniques, including full-body electron optic
system (EOS) radiographs, CT scans, and MRI, have enabled more accurate measurements of spinopelvic
parameters. The development of more sophisticated software has led to accelerated measurement times via
[50]
semi-automated computer-aided tools, such as SurgiMap . Software tools such as SurgiMap have
demonstrated a mean time efficiency of 75 ± 25 s to perform a full spinopelvic analysis, significantly
reducing the burden associated with manual measurements . Our review of the existing literature on deep
[50]
learning models for spinopelvic parameter measurement revealed processing times ranging from 0.2 to 1 s
per image. A set of radiographs for spinopelvic parameter measurement typically involves 2-3 images on
average: a lateral X-ray, an anterior and posterior X-ray, and possibly a full-body EOS image in more
complicated cases. Regarding time saved, deep learning models would require an estimated 0.6-3 s to
analyze a full set of images compared to the 75-second mean from the studies mentioned previously. Deep
learning models are, therefore, roughly 25× more efficient. Additionally, there were studies included in our
analysis that involved pathological images, whereas the study using SurgiMap involved images with no
pathology, further demonstrating the capability and efficiency of deep learning technology. To contextualize
these efficiency gains with accuracy: Manual measurements typically show inter-observer variability of 5°-
10° for the Cobb angle and similar ranges for other parameters. Semi-automated tools reduced this
variability to 3°-7°. Our meta-analysis found AI measurement errors of 4.3° for Cobb angle, 3.9° for thoracic
kyphosis, and 3.6° for lumbar lordosis - comparable to or better than both manual and semi-automated
methods. This suggests AI can dramatically improve measurement efficiency without compromising
accuracy, potentially offering both time savings and measurement reliability improvements in clinical
practice.

No one model stood out as superior to the others. Each study and the model they used had advantages and
disadvantages that are open to interpretation. For example, the model used by Zerouali et al. was mainly
tested in a pediatric population; therefore, this model would likely only be of interest to a surgeon who
operates on this population . Many studies only involved a single clinical dataset, which is a key reason
[22]
why we argue for multicenter validation to demonstrate reproducibility. Additionally, some studies did not
train their models on patients who had implants. Therefore, these models would require further validation
to be useable in scenarios such as postoperative evaluation and planning for revision surgery. What was
consistent across all models was that they all were more efficient than current methods without
compromising accuracy.

Despite the demonstrated accuracy and efficiency of these models, there remains a gap in understanding
their practical utility for surgeons across various clinical contexts, including preoperative and intraoperative
stages. Theoretically, the enhancement in efficiency should offer surgeons more time to review images and
make surgical plans. Pending multicenter validation, future research should explore whether or not the
integration of deep learning truly enhances efficiency throughout the entire perioperative continuum. For
example, a surgeon may use deep learning as an adjunct for formulating a preoperative plan. Within
surgery, intraoperative X-ray image evaluation may allow synchronous measurement of spinopelvic
parameters to assess the efficacy of hardware placement. Lastly, in the postoperative phase, the technology
can be used to predict postoperative complications and 30-day readmission rates as stated earlier, with the
potential for much more. No one model stood out as superior to the others. Each study and the model it
used had advantages and disadvantages that are open to interpretation.

A notable limitation in measuring PI deserves specific attention. Our meta-analysis found PI measurements
had a relatively higher pooled error of 4.1° compared to other pelvic parameters such as PT (1.9°). This
larger error can be attributed to several specific challenges: First, the presence of double-dome endplates can

89 90 91 92 93 94 95 96 97 98 99