Page 52 - Read Online
P. 52
Page 10 Shapey et al. Art Int Surg 2023;3:1-13 https://dx.doi.org/10.20517/ais.2022.31
predictor factors using classical statistical methods are reasonably robust and reliable; (c) ML might be
better suited towards appreciating the complex relationships between pre-identified predictor variables and
incorporating them into predictive models, rather than identification of predictor variables in the first
instance; (d) ML models demonstrate greater potential as dynamic tools to guide decision making, for
example, the timing of drain removal, rather than as static models that represent predicted risk at a single
point in time.
Prospective machine learning prediction of complications following pancreatic surgery
[45]
Only one study prospectively studied ML prediction of post-pancreatectomy complications . Cos et al.
used a telemonitoring wearable device (Fitbit) to measure heart rate, step count and sleep features in 48
[45]
patients pre-pancreatectomy . Combined with clinical characteristics, this activity data was used by a
gradient boosting model (GBM) to predict a textbook surgical outcome postoperatively, outperforming the
widely used ACS-NSQIP Surgical Risk Calculator (aROC: ML 0.79 vs. NSQIP 0.63).
Machine Learning to predict postoperative complications in hepatic surgery
In a first-of-its-kind nationwide population-based analysis of 22926 Taiwanese patients, Shi et al. predicted
5-year mortality post-HCC surgery using an artificial neural network (ANN) . This study reported that
[46]
surgeon volume (caseload) was the most influential factor in predicting postoperative mortality, with an
AUC of 0.89. Nonetheless, the retrospective nature of this work and the absence of clinical parameters
represent significant limitations that preclude the clinical utility of the model.
Machine Learning prediction of post-hepatectomy outcomes
ML approaches in hepatic surgery have mostly focused on predicting survival and recurrence post-
hepatectomy in hepatocellular carcinoma (HCC). Qiao et al. collected prospective data on 725 patients with
early HCC and predicted overall survival (OS) following minor hepatectomy using an ANN . In this study,
[47]
linear regression analysis was used to identify significant (P < 0.05) predictors, including tumor size &
number, alpha-fetoprotein, microvascular invasion, and tumor encapsulation. ANN was then used to best
appreciate the inter-variable relationships and develop a predictive model (aROC 0.86 - training cohort).
The model was then externally validated on a separate dataset, achieving an aROC of 0.83. One limitation of
ANN methodology is that the individual weightings and relationships of clinicopathological factors cannot
[48]
be reported and interpreted because of the nature of the black box algorithm utilised by ANNs .
Huang et al. created an XGBoost model which predicted read recurrence-free survival (RFS) post-HCC
[49]
resection from retrospective data collected in 7919 patients . Their XGBoost model showed modest
improvement over the Early Recurrence After Surgery for Liver tumour (ERASL) score in external
validation (aROC; ML 0.70 vs. ERASL 0.67). The modest aROCs in this model highlight both the
importance of high quality and prospectively validated data inputs and the impact of the chosen ML
algorithm on the performance of the model. However, a unique capability reported by this study was the
ability to create individualised patient risk heatmaps of tumour recurrence over time, which could inform
personalised surveillance strategies.
Post-hepatectomy liver failure represents a significant postoperative complication that alters the trajectory
of surgical outcomes. Mai et al. developed an ANN utilizing five preoperative indicators of hepatocyte
function and volume (Platelet count, Prothrombin Time, Bilirubin, Aspartate Transaminase and Functional
Liver Remnant) in 353 patients undergoing hepatic resection to predict severe post-hepatectomy liver
failure (PHLF) . This model demonstrated exceptional performance (aROC 0.88) in both training and
[50]
validation cohorts and outperformed other commonly used scoring systems by considerable margins
(Child-Pugh: 0.568, Model for End-stage Liver Disease: 0.608, Albumin-bilirubin: 0.627, platelet-albumin-