Page 97 - Read Online
P. 97
Page 55 Xu et al. Art Int Surg 2023;3:48-63 https://dx.doi.org/10.20517/ais.2022.33
Ji et al. designed an ML framework that identified a three-feature radiomic signature of contrast-enhanced
CT images. To further boost prediction performance, clinical factors and biochemical measures like the
serum AFP level and albumin-bilirubin grade were included. Their model achieved a C-statistic of 0.73 and
[66]
outperformed conventional metrics of prognostication like BCLC scoring . Wang et al. devised a similar
combined model using multiphasic CT features and clinical factors, yielding promising results with an AUC
of 0.82. In a similar vein, Saillard et al. employed a DL model based on digitized histological slides that
could predict post-resection survival more accurately than relevant clinical, biological, and pathological
[67]
factors . However, these findings were not upheld when subjected to external validation. Post-resection
features predicting survival have had limited clinical impact due to the lack of adjuvant treatment options in
HCC previously. With continued expansions and trials in adjuvant treatment in HCC, such features may
have relevance when incorporated into survival prediction post-resection.
Liver transplantation
Recipient selection
The Model for End-Stage Liver Disease (MELD) score, originally devised to prognosticate patients after a
transjugular intrahepatic portosystemic shunt (TIPS) procedure for portal hypertension, has been used
since 2002 for prioritizing donor liver allocation in liver transplantation in a “sickest-first” approach . This
[68]
logarithmic score comprises biochemical factors like the International Normalized Ratio (INR), serum
creatinine, and total serum bilirubin. While regional allocation policies may differ, the final MELD score
given to a patient on the waiting list usually gives additional ‘exception points’ after considering the etiology
of cirrhosis as well . This model has served patients around the world well for many years, but it is
[69]
gradually being superseded by more updated listing criteria. The MELD score has been critiqued for being
disadvantageous to female patients because of its inclusion of serum creatinine (typically lower in females)
without correction for gender. While the new MELD 3.0 score promises to correct for gender bias, the
question remains – could AI-based models outperform this, either supervised on unsupervised?
The Optimized Prediction of Mortality (OPOM) model employs ML optimal classification tree models to
more accurately predict three-month mortality compared to the MELD score. Specifically, a model was
calibrated based on optimal classification trees (or OCTs), which represented a ML prediction method that
afforded interpretability and high prediction accuracy. This predictive model was trained on historical data
of patients in the United States from 2002 to 2016 (comprising 1,618, 966 patient observations) obtained
from the Scientific Registry of Transplant Recipients (SRTR) in a Liver Simulation Allocation Model
(LSAM). The end product was a classification tree that predicted the probability of a patient dying or
becoming unsuitable for transplant within 3 months (the dependent variable), given observations of certain
patient characteristics (the independent variables). Bertsimas et al. showed that OPOM allocation reduced
mortality by 417.96 deaths per year compared to MELD . Indeed, although a simple method to stratify
[70]
candidates awaiting liver transplantation, the MELD score is a linear regression method that does not
accurately predict mortality for all candidates who can benefit from liver transplantation. This is especially
demonstrated in the significant deterioration in MELD predictive capabilities with increasing disease
severity compared to OPOM. In contrast to MELD, which demonstrated decreasing AUC values as sicker
patient strata are considered, OPOM maintained significantly higher AUCs, especially within the sickest
candidate population, thus allowing for a more accurate prediction of waitlist mortality. A recent study by
Yu et al. using ML in a Korean cohort also showed superior outcomes of its random forest model (AUC
[71]
0.80-0.85) compared to using the MELD score (AUC 0.70) .
Unfortunately, the OPOM experimental model has yet to be validated in other centers with HCC patient
cohorts. It should be noted that LSAM analysis is also limited in that it only allows for an accurate
assessment of waitlist deaths, as waitlist removals include not only candidates with deterioration in their