Page 98 - Read Online
P. 98

Page 4 of 26                                                    Li et al. Cancer Drug Resist. 2025;8:31





               significance threshold, and an absolute log2-fold change (log2FC) greater than 1. Significantly differentially
               expressed genes were visualized using volcano plots and heatmaps with the ggplot2  and pheatmap  R
                                                                                       [17]
                                                                                                       [21]
               packages.
               Based on previous studies, CTLA4, HAVCR2, IDO1, and LAG3 are categorized as immune
               checkpoint-related markers, whereas CD8A, CXCL10, CXCL9, GZMA, GZMB, IFNG, PRF1, TBX2, and TNF
               are designated as immune-related markers . The expression levels of these two categories of genes were
                                                    [22]
               examined in different sample groups, and their statistical significance was ascertained using the Wilcoxon
               rank-sum test.


               Weighted gene co-expression network analysis
               Weighted gene co-expression network analysis (WGCNA) was used to identify the gene modules associated
               with the sample grouping labels of differentially expressed genes. This entails grouping genes with analogous
               expression patterns and investigating their associations with particular traits or phenotypes . The
                                                                                                    [23]
               pickSoftThreshold function from the R package WGCNA was used to select the optimal soft threshold ,
                                                                                                        [24]
               and to calculate both the scale-free topology fit index and mean connectivity across a range of candidate
               soft-thresholding powers. The optimal power of 3 was determined by considering the point where the
               scale-free topology fit index first exceeded 0.8, and the inflection point in the corresponding curve, thereby
               balancing the network’s scale-free property with mean connectivity preservation. Gene module visualization
               was performed using the plotDendroAndColors function and correlations between gene modules were
               visualized using a correlation heatmap. To identify the gene modules associated with TCGA sample
               grouping labels, the results were visualized using the Labeled Heatmap function.


               Moreover, for the purpose of examining the biological functions linked to the correlated gene modules, gene
               enrichment analysis was conducted using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and
               Genomes (KEGG) methods. The GO enrichment analysis is commonly used to study the large-scale
               functional enrichment of genes in different dimensions and at different levels . The KEGG database is
                                                                                   [25]
               frequently used to store information related to genomes, pathways, diseases, and drugs . The R package
                                                                                           [26]
               clusterProfiler [27,28]  was used to annotate gene functions and perform the KEGG pathway enrichment on
               genes in the associated gene modules, with a significance threshold set at corrected P-values of <​ 0.05.


               Construction and validation of prognostic prediction models
               A prognostic model was constructed using univariate Cox and least absolute shrinkage and selection
               operator (LASSO) regression analyses based on the gene modules mentioned above, and the validity and
               predictive efficacy of the model were verified using three external datasets.

               The survival function in the R package was used to perform univariate Cox regression , with a significance
                                                                                        [29]
               threshold set at P-values of <​ 0.05. Subsequently, the genes that met the statistical threshold were used as
               inputs for the LASSO regression models using the glmnet  function in the R package. The cv.glmnet
                                                                   [30]
               function was used for model construction, with the family parameter set to Cox and alpha set to 1. Genes
               with coefficient values of zero were removed to establish the final model. Using the constructed models, risk
               scores were allocated to the TCGA PRAD dataset. The surv_cutpoint function from the survminer  R
                                                                                                       [31]
               package was used to select the optimal grouping threshold by dividing all samples into high- and low-risk
               groups. Subsequently, stratified survival analysis was conducted on the datasets from these groups, and the
               Kaplan-Meier (KM) survival curves were generated to evaluate the model’s predictive performance. The
               receiver operating characteristic (ROC) curves were plotted to predict 1-, 3-, and 5-year survival rates in the
               original dataset using the R package timeROC . In addition, the same risk-scoring process was applied to
                                                      [32]
               three external datasets, GSE46602 , GSE70769 , and GSE116918 , to assess the model predictive
                                                          [11]
                                              [10]
                                                                            [12]
               performance by plotting the KM survival curves.
                                                           91
   93   94   95   96   97   98   99   100   101   102   103