Page 98 - Read Online
P. 98
Page 4 of 26 Li et al. Cancer Drug Resist. 2025;8:31
significance threshold, and an absolute log2-fold change (log2FC) greater than 1. Significantly differentially
expressed genes were visualized using volcano plots and heatmaps with the ggplot2 and pheatmap R
[17]
[21]
packages.
Based on previous studies, CTLA4, HAVCR2, IDO1, and LAG3 are categorized as immune
checkpoint-related markers, whereas CD8A, CXCL10, CXCL9, GZMA, GZMB, IFNG, PRF1, TBX2, and TNF
are designated as immune-related markers . The expression levels of these two categories of genes were
[22]
examined in different sample groups, and their statistical significance was ascertained using the Wilcoxon
rank-sum test.
Weighted gene co-expression network analysis
Weighted gene co-expression network analysis (WGCNA) was used to identify the gene modules associated
with the sample grouping labels of differentially expressed genes. This entails grouping genes with analogous
expression patterns and investigating their associations with particular traits or phenotypes . The
[23]
pickSoftThreshold function from the R package WGCNA was used to select the optimal soft threshold ,
[24]
and to calculate both the scale-free topology fit index and mean connectivity across a range of candidate
soft-thresholding powers. The optimal power of 3 was determined by considering the point where the
scale-free topology fit index first exceeded 0.8, and the inflection point in the corresponding curve, thereby
balancing the network’s scale-free property with mean connectivity preservation. Gene module visualization
was performed using the plotDendroAndColors function and correlations between gene modules were
visualized using a correlation heatmap. To identify the gene modules associated with TCGA sample
grouping labels, the results were visualized using the Labeled Heatmap function.
Moreover, for the purpose of examining the biological functions linked to the correlated gene modules, gene
enrichment analysis was conducted using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and
Genomes (KEGG) methods. The GO enrichment analysis is commonly used to study the large-scale
functional enrichment of genes in different dimensions and at different levels . The KEGG database is
[25]
frequently used to store information related to genomes, pathways, diseases, and drugs . The R package
[26]
clusterProfiler [27,28] was used to annotate gene functions and perform the KEGG pathway enrichment on
genes in the associated gene modules, with a significance threshold set at corrected P-values of < 0.05.
Construction and validation of prognostic prediction models
A prognostic model was constructed using univariate Cox and least absolute shrinkage and selection
operator (LASSO) regression analyses based on the gene modules mentioned above, and the validity and
predictive efficacy of the model were verified using three external datasets.
The survival function in the R package was used to perform univariate Cox regression , with a significance
[29]
threshold set at P-values of < 0.05. Subsequently, the genes that met the statistical threshold were used as
inputs for the LASSO regression models using the glmnet function in the R package. The cv.glmnet
[30]
function was used for model construction, with the family parameter set to Cox and alpha set to 1. Genes
with coefficient values of zero were removed to establish the final model. Using the constructed models, risk
scores were allocated to the TCGA PRAD dataset. The surv_cutpoint function from the survminer R
[31]
package was used to select the optimal grouping threshold by dividing all samples into high- and low-risk
groups. Subsequently, stratified survival analysis was conducted on the datasets from these groups, and the
Kaplan-Meier (KM) survival curves were generated to evaluate the model’s predictive performance. The
receiver operating characteristic (ROC) curves were plotted to predict 1-, 3-, and 5-year survival rates in the
original dataset using the R package timeROC . In addition, the same risk-scoring process was applied to
[32]
three external datasets, GSE46602 , GSE70769 , and GSE116918 , to assess the model predictive
[11]
[10]
[12]
performance by plotting the KM survival curves.
91

