Page 8 - Read Online
P. 8

Page 87               Comertpay et al. J Transl Genet Genom 2022;6:84-94  https://dx.doi.org/10.20517/jtgg.2021.44

                                  [22]
                                                         [23]
               Genes and Genomes  (KEGG) and Reactome  were used as a pathway database for the analyses.
               Statistically significant values (P < 0.05) representing the significance of enrichment analysis were obtained
               by Fisher’s exact test.

               Performance evaluation of the seed genes with a classification algorithm
               The Ridge regression approach was used to understand the importance of seed genes in obese patients with
               breast cancer. This method is modeled as a linear weighted sum of biomarkers, performing a regularization
               punishment to limit the enormity of the regression coefficients. This gives rise to a sparse set of genes (i.e.,
               biomarkers) that predict disease. This method limits the estimates of the regression coefficients towards no
               correlation to the maximum likelihood estimates. Ridge regression employs a penalization term to reduce
               overfitting. However, instead of using the sum of the absolute values, it uses the sum of the squares of the
               coefficients. As a result, under Ridge regression, the coefficients are not zero. The Ridge function is:










               The machine learning algorithm was used to check the path validity of the identified common seed genes.
                                                                                           [26]
                                                                     [25]
                                                         [24]
               To execute the regression algorithm, the NumPy  and Pandas  packages of the Python  platform were
               used. Furthermore, to overcome the difficulty of insufficient data on obese patients with breast cancer with
               genotypes to train a high-performance model of risk prediction for obese patients with breast cancer, we
               interpret our recommended method at 10 replicates of five-fold cross-validation. Mathematically, Ridge
                                                                     [27]
               regression can be defined by using a single penalty function “α” . A penalty parameter α = 0.1077 was used
               in the Ridge algorithm. A high value for the penalty parameter (α) will result in a heavy penalty, leading to
               the selection of fewer variables. In addition, test size and random state were taken as 0.25 and 42,
               respectively.


               RESULTS
               Transcriptome profiling of obese patients with breast cancer
               The statistical analyses of the gene expression dataset resulted in the identification of up- and
               downregulated DEGs with P < 0.05 and FC > 1.5 or FC < 0.67. Nineteen downregulated and four
               upregulated genes were identified. 4-Aminobutyrate aminotransferase (ABAT), beta polypeptide (ADH1B),
               angiotensin II receptor type 1 (AGTR1), cyclin D1 (CCND1), dual specificity phosphatase 4 (DUSP4), flavin
               containing dimethylaniline monoxygenase 2 (FMO2), FRY microtubule binding protein (FRY), polypeptide
               n-acetylgalactosaminyltransferase 7 (GALNT7), glutamate ionotropic receptor AMPA type subunit 2
               (GRIA2), glycogenin 2 (GYG2), interleukin-6 cytokine family signal transducer (IL6ST), keratin 6B
               (KRT6B),  mesoderm  specific  transcript  (MEST),  matrix  metallopeptidase  12  (MMP12),  matrix
               metallopeptidase 9 (MMP9), phospholamban (PLN), protein kinase CAMP-dependent type II regulatory
               subunit beta (PRKAR2B), ribonuclease a family member 4 (RNASE4), S100 calcium binding protein A2
               (S100A2), signal peptide, CUB domain and EGF-like domain containing 2 (SCUBE2), semaphorin 3C
               (SEMA3C), tissue factor pathway inhibitor (TFPI), and transforming growth factor beta receptor 3
               (TGFBR3) were identified as DEGs. A small number of DEGs may have been obtained due to the study of
               the effect of obesity on tumor tissues.


               Biological and clinical features of seed genes
               Co-expression network analyses were performed, which identified 23 DEGs of obese patients with breast
               cancer samples obtained from the GSE24185 dataset [Figure 1]. Co-expressed genes were identified as
   3   4   5   6   7   8   9   10   11   12   13