Page 8 - Read Online
P. 8

Schaafsma et al. J Cancer Metastasis Treat 2021;7:34  https://dx.doi.org/10.20517/2394-4722.2021.72  Page 3 of 12

               METHODS
               Utilized data
                                                                                                    [13]
               A total of 11 publicly available gene expression datasets were utilized in this study. The Westerman  and
               Oberthuer  datasets were obtained from the European Molecular Biology Laboratory (EMBL) database
                        [14]
                                                                                                       [17]
                                                                                       [15]
               under accession numbers E-TABM-38 and E-MTAB-179, respectively. The Henrich , SEQC , Kocak ,
                                                                                               [16]
               Wang , Rajbhandari , Lastowska , and Ackerman  datasets were obtained from the Gene Expression
                                                            [21]
                                             [20]
                    [18]
                                 [19]
               Omnibus (GEO) under accession numbers GSE73517, GSE62564, GSE45547, GSE3960, GSE85047,
               GSE13136, GSE120572, respectively. The Berwanger  dataset was obtained from the PREdiction of Clinical
                                                           [22]
               Outcomes from Genomic Profiles portal (https://precog.stanford.edu/; accession: Berwanger_NB). The
                    [23]
               ICGC  dataset was obtained through the ICGC portal (https://dcc.icgc.org/). Microarray datasets were
               provided as normalized expression at the probeset level in which some genes might be represented by
               multiple probesets. We converted probeset expression into gene expression values. Specifically, for one-
               channel arrays, we selected the probeset with the highest hybridization intensity across all samples to
               represent gene expression. For two-channel arrays, the average expression values of all probesets were
               calculated to represent gene expression. Datasets from one-channel arrays were further median normalized
               for each gene to transform intensities into relative expression values. Depending on availability, associated
               clinical data were obtained through EMBL, GEO, or the manuscript accompanying the dataset. See
               Supplementary Table 1 for detailed information and available clinical variables for each dataset.

               Immune cell inference
               A detailed description of immune cell inference can be found in [24,25] . Briefly, patient-specific immune cell
               type inference was determined by evaluating the similarity between six predefined gene expression weight
               profiles (one for each immune cell type) and patient gene expression profiles using BASE , a rank-based
                                                                                            [26]
               gene set enrichment method. High similarity between a patient’s gene expression profile and an immune
               cell weight profile resulted in high enrichment scores for that immune cell type for that particular patient.
               Due to the scale-free nature of resulting infiltration scores, immune cell infiltration scores are only
               comparable within each dataset and within an individual immune cell type.


               Survival analysis
               Survival analyses were performed using the R survival package (version 3.1-8). Log-rank tests were
               performed to evaluate overall survival probabilities between two groups using the survdiff function. Kaplan-
               Meier (KM) plots were generated using the survfit function. Results from Cox proportional hazards
               (Coxph) models shown in KM plots were performed on continuous immune infiltration scores in a
               univariate regression model, using the coxph function from the survival package. Shown P-values were
               obtained from a two-sided Wald test. Forest plots were based on the results of multivariate Coxph models in
               which all variables specified in the figure panels were included and immune cell infiltration was
               dichotomized based on the median infiltration score.


               Statistical methods
               The Spearman correlation coefficient (SCC) was reported for all correlation analyses as the assumptions
               underlying the Pearson correlation (i.e., normal distribution, homoscedasticity or linearity) were not met.
               SCC was calculated using the R function cor and significance was assessed using cor.-test. Immune cell
               infiltration variance explained by different chromosomal abnormalities was calculated using multivariate
               linear regression models using the lm and anova functions. The order of each of the four chromosomal
               abnormalities was randomly shuffled 100 times to obtain the standard deviation and mean variance. P-
               values smaller than 0.05 were considered significant. All analyses were conducted in R (version 3.6.2).
   3   4   5   6   7   8   9   10   11   12   13