Page 97 - Read Online
P. 97
Li et al. Cancer Drug Resist. 2025;8:31 Page 3 of 26
METHODS
Data collection and preprocessing
Count expression profiles and corresponding clinical data for prostate adenocarcinoma (PRAD) were
downloaded from the TCGA (https://www.cancer.gov) . This dataset contained 554 samples, including 52
[8]
PRAD-adjacent “normal” and 502 PRAD cancer samples. In addition, PCa tissue transcriptome sequencing
data were collected from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo
) , including three datasets with accession numbers GSE46602 , GSE70769 , and GSE116918 . All these
[10]
[9]
[12]
[11]
datasets were derived from Homo sapiens and were related to PCa. GSE46602 contained 50 samples,
including 36 cancer samples and 14 adjacent “normal” samples. Only cancer samples were included in the
analysis and the sequencing platform used was GPL570. GSE70769 included 94 cancer samples with no
adjacent “normal” samples, of which 45 samples had matching survival data. The sequencing platform used
was GPL10558. GSE116918 contained 248 cancer samples with no adjacent “normal” samples, and all had
survival data with the sequencing platform GPL25318. These datasets were neither merged nor subjected to
batch effect correction. Instead, the prognostic model was applied to each dataset separately for external
validation. This approach ensures that model performance can be robustly evaluated across different
platforms, avoiding additional technical variations introduced by data integration. In total, 329 cancer
samples with survival data were used for model validation.
In addition, single-cell RNA sequencing (scRNA-seq) data for PCa were obtained from the literature , and
[13]
included 2,170 cells sequenced on the Illumina NexteraXT platform. Based on the previously reported cell
labels , 835 PCa cells were extracted for single-cell level validation.
[13]
Tumor immune infiltration analysis
The immune scores and infiltration levels of 22 immune cell types in PCa samples from TCGA were
calculated using the R package estimate and CIBERSORT . The functions filterCommonGenes and
[15]
[14]
estimateScore in the estimate package were used with the default parameters. In the CIBERSORT analysis,
we determined the infiltration scores of 22 immune cell types for each sample. These scores were calculated
using the LM22 background gene set provided by CIBERSORT , with the perm parameter set to 500 and
[15]
the other parameters set to default values.
Subsequently, unsupervised clustering was performed on all PCa samples based on immune scores and
immune cell infiltration matrix. To determine the optimal number of clusters, the fviz_nbclust function in
the R package factoextra was used, with the average silhouette width as the evaluation metric . The optimal
[16]
cluster number (k = 2) was determined by maximizing the average silhouette width across candidate cluster
numbers. Samples were subsequently partitioned using k-means clustering and visualized via principal
component analysis (PCA).
The infiltration levels of the 22 immune cell types in the different sample groups were visualized in box plots
using ggplot2 and ggpubr R packages. Statistical significance was assessed using the Wilcoxon rank-sum
[18]
[17]
test. Heatmaps and stacked bar plots were used to visualize the immune scores and the distribution of
immune cell infiltration levels, respectively.
Comparisons of immune features
To compare the immune characteristics between the groups, we used correlation analysis of the 22 immune
cell types in the TCGA samples using the corrplot R package . PD-1 and PD-L1 are two prominent
[19]
immune checkpoint genes targeted in immunotherapy, and we used the violin and box plots to examine
their expression levels in different sample groups. Subsequently, the analysis of differential gene expression
between the groups was conducted using the DESeq2 package , using adjusted P-values of < 0.05 as a
[20]
90

