Page 8 - Read Online
P. 8
Schaafsma et al. J Cancer Metastasis Treat 2021;7:34 https://dx.doi.org/10.20517/2394-4722.2021.72 Page 3 of 12
METHODS
Utilized data
[13]
A total of 11 publicly available gene expression datasets were utilized in this study. The Westerman and
Oberthuer datasets were obtained from the European Molecular Biology Laboratory (EMBL) database
[14]
[17]
[15]
under accession numbers E-TABM-38 and E-MTAB-179, respectively. The Henrich , SEQC , Kocak ,
[16]
Wang , Rajbhandari , Lastowska , and Ackerman datasets were obtained from the Gene Expression
[21]
[20]
[18]
[19]
Omnibus (GEO) under accession numbers GSE73517, GSE62564, GSE45547, GSE3960, GSE85047,
GSE13136, GSE120572, respectively. The Berwanger dataset was obtained from the PREdiction of Clinical
[22]
Outcomes from Genomic Profiles portal (https://precog.stanford.edu/; accession: Berwanger_NB). The
[23]
ICGC dataset was obtained through the ICGC portal (https://dcc.icgc.org/). Microarray datasets were
provided as normalized expression at the probeset level in which some genes might be represented by
multiple probesets. We converted probeset expression into gene expression values. Specifically, for one-
channel arrays, we selected the probeset with the highest hybridization intensity across all samples to
represent gene expression. For two-channel arrays, the average expression values of all probesets were
calculated to represent gene expression. Datasets from one-channel arrays were further median normalized
for each gene to transform intensities into relative expression values. Depending on availability, associated
clinical data were obtained through EMBL, GEO, or the manuscript accompanying the dataset. See
Supplementary Table 1 for detailed information and available clinical variables for each dataset.
Immune cell inference
A detailed description of immune cell inference can be found in [24,25] . Briefly, patient-specific immune cell
type inference was determined by evaluating the similarity between six predefined gene expression weight
profiles (one for each immune cell type) and patient gene expression profiles using BASE , a rank-based
[26]
gene set enrichment method. High similarity between a patient’s gene expression profile and an immune
cell weight profile resulted in high enrichment scores for that immune cell type for that particular patient.
Due to the scale-free nature of resulting infiltration scores, immune cell infiltration scores are only
comparable within each dataset and within an individual immune cell type.
Survival analysis
Survival analyses were performed using the R survival package (version 3.1-8). Log-rank tests were
performed to evaluate overall survival probabilities between two groups using the survdiff function. Kaplan-
Meier (KM) plots were generated using the survfit function. Results from Cox proportional hazards
(Coxph) models shown in KM plots were performed on continuous immune infiltration scores in a
univariate regression model, using the coxph function from the survival package. Shown P-values were
obtained from a two-sided Wald test. Forest plots were based on the results of multivariate Coxph models in
which all variables specified in the figure panels were included and immune cell infiltration was
dichotomized based on the median infiltration score.
Statistical methods
The Spearman correlation coefficient (SCC) was reported for all correlation analyses as the assumptions
underlying the Pearson correlation (i.e., normal distribution, homoscedasticity or linearity) were not met.
SCC was calculated using the R function cor and significance was assessed using cor.-test. Immune cell
infiltration variance explained by different chromosomal abnormalities was calculated using multivariate
linear regression models using the lm and anova functions. The order of each of the four chromosomal
abnormalities was randomly shuffled 100 times to obtain the standard deviation and mean variance. P-
values smaller than 0.05 were considered significant. All analyses were conducted in R (version 3.6.2).