Page 19 - Read Online

P. 19

Ding et al. J Transl Genet Genom 2021;5:50-61 I http://dx.doi.org/10.20517/jtgg.2020.01 Page 53

Table 2. The number of DEGs in relation to age, sample type, and Gleason score
* DEGs (|fold| > 1.5 and FDR < 0.05) Old patients Young patients
(aged 38-50)
(aged 71-75)
Tumor versus benign tissue comparison
Patients with Gleason sum of 6 1250 1314
Patients with Gleason sum of 7 1443 1485
Patients with Gleason sum of 8+ 3221 1923
^
Low versus high Gleason score comparison 1392 650
*A total of 5156 unique DEGs identified from 8 different comparisons; ^Low Gleason score of 6 and high Gleason score of 8+ (8 to 10).
DEGs: Differentially expressed genes.

Three independent validation data sets [Supplementary Table 1] with follow-up time [the Mayo Clinic
[20]
[21]
II (MC II) , the Cleveland Clinic (CC) , and the Thomas Jefferson University (TJU)] were used to
[19]
[19]
evaluate the performance of the iPAM classifier by AUC of ROC for censored survival data . The 95%
confidence interval for AUC of survival ROC was constructed from 1000 bootstrap replications. Based
on bimodal distribution of risk scores predicted by the iPAM classifier, two cut points were selected to
categorize patients into low-, intermediate-, and high-risk groups. Kaplan-Meier estimator and a log-rank
test were used to evaluate the difference in time to metastasis among the risk groups. The conventional
AUC of ROC was calculated to measure prediction accuracy for the fourth validation data set from
the Memorial Sloan-Kettering Cancer Center (MSKCC), which had no follow-up time but categorical
metastasis status for each patient.

Estimation of cell-type proportion in tissue microenvironment
[22]
xCell was used to estimate the abundance of 34 immune cell types for each tissue sample using genome-
wide gene expression data. Cell-type proportion in tissue microenvironment estimated by xCell method is
a rank-based enrichment score. Non-parametric analysis of variance (ANOVA) (confidence interval and
p-values generated by percentile bootstrap) implemented in the “Rallfun-v35” R codes from Dr. Wilcox ,
[23]
was used to test median differences in immune score (average abundance of immune cells) between sample
groups classified by factors of sample type (tumor, benign), metastasis status (yes, no), and age group
(young, middle-aged, old).

RESULTS
Identification of age-related DEGs
We previously identified genes differentially expressed between tumor and matched benign prostatic
[9]
tissue samples for young men (≤ 45 years) and old men (71-74 years) with Gleason score 7 (3 + 4) CaP .
Following the same study design, we generated gene expression data for tumor and matched benign
prostatic samples from young men (≤ 50 years, n = 34) and old men (71-75 years, n = 36) with CaP Gleason
scores of 6 or 8-10. We identified 5,156 unique DEGs as potential candidate genes for developing the iPAM
classifier [Table 2]. Dot plots of gene expression for two DEGs are shown in Figure 2A and Figure 2B as
examples. Details on the 5156 DEGs are available upon request.

iPAM classifier development and performance assessment
Gene expression data for the 5156 DEGs were extracted from the MC I discovery data set. Of those DEGs,
419 were differentially expressed (false discovery rate [FDR] < 0.05) between patients who did and did not
[18]
develop metastasis. The iPAM program selected 36 genes [Table 3] of the 419 that predicted metastatic
CaP in the training dataset and then generated an AUC of 0.75 for the test data set. We assembled those
36 genes into an iPAM classifier by fitting a logistic regression model on the training samples, and applied
the iPAM classifier to four independent validation data sets. The predicted iPAM risk scores for metastasis
showed a bimodal distribution with the score range of 0-1, where higher scores represent higher risk of

14 15 16 17 18 19 20 21 22 23 24