Page 83 - Read Online

P. 83

Glaser et al. Art Int Surg. 2025;5:1-15 https://dx.doi.org/10.20517/ais.2024.36 Page 3

METHODS
This Meta-analysis was conducted according to the preferred reporting items for systematic reviews and
[16]
meta-analyses (PRISMA) guidelines [Supplementary Table 1].
Search strategy
A comprehensive literature search was conducted using PubMed, Embase, Scopus, and Cochrane databases
from inception to December 2023 to identify relevant studies. The search strategy included a combination
of controlled vocabulary terms (e.g., MeSH) and keywords related to “artificial intelligence”, “deep
learning”, “convolutional neural network”, “spine”, “spinopelvic parameters”, and related terms. Reference
lists of included articles and relevant systematic reviews were hand-searched to identify any additional
eligible studies.

Study selection
Studies were included if they met the following criteria: (1) published in English language peer-reviewed
journals; (2) used deep learning models including CNNs to automatically estimate spinopelvic parameters
from radiographs (X-ray); (3) reported model performance metrics compared to human rater
measurements including mean absolute error and correlation coefficient. Conference abstracts, case reports,
editorials, and non-peer reviewed articles were excluded.

Two reviewers (A.K.M and J.C) independently screened the titles, abstracts, and full texts of retrieved
records against the eligibility criteria. Disagreements were resolved by consensus or consultation with a
third reviewer if needed. The study selection process was documented using a PRISMA flow diagram
[Figure 1].

Data extraction
A standardized data extraction form was created and pilot-tested on a subset of included studies. Two
reviewers (A.K.M and J.C) then independently extracted data from the full set of included studies. Extracted
information included: first author name, publication year, dataset details (number of images, resolution,
pathology), imaging modality, model details, spinopelvic parameters analyzed (Accuracy Metrics), deep
learning model details including architecture and training approach, mean absolute error, correlation
coefficient, batch size, number of epochs, any additional reported performance metrics, computational
efficiency, validation approach, and any key limitation. Any discrepancies in extracted data were resolved
through discussion and mutual consensus. Additionally, studies focusing specifically on lumbosacral
transitional vertebrae (LSTV) were excluded to maintain homogeneity in the analysis. While LSTV can
significantly impact spinopelvic measurements, the unique challenges they present in parameter assessment
warrant separate consideration from standard spinopelvic measurements. This exclusion allowed for a more
consistent comparison of measurement accuracy across included studies.

Statistical analysis
A random-effects meta-analysis was performed to pool the mean absolute errors reported by the included
studies for each spinopelvic parameter. The inverse variance method was used to calculate the weighted
mean differences and 95% confidence intervals (CIs). Heterogeneity among the studies was assessed using
2
the I statistic, which represents the percentage of total variation across studies due to heterogeneity rather
than chance. An I value of 0% indicates no observed heterogeneity, while larger values indicate increasing
2
heterogeneity. The pooled estimates and their 95%CIs were graphically presented using forest plots. All
statistical analyses were conducted using R software (version 4.0.3) with the “meta” package (version 4.15-
1).

78 79 80 81 82 83 84 85 86 87 88