Page 8 - Read Online

P. 8

Page 87 Comertpay et al. J Transl Genet Genom 2022;6:84-94 https://dx.doi.org/10.20517/jtgg.2021.44

[22]
[23]
Genes and Genomes (KEGG) and Reactome were used as a pathway database for the analyses.
Statistically significant values (P < 0.05) representing the significance of enrichment analysis were obtained
by Fisher’s exact test.

Performance evaluation of the seed genes with a classification algorithm
The Ridge regression approach was used to understand the importance of seed genes in obese patients with
breast cancer. This method is modeled as a linear weighted sum of biomarkers, performing a regularization
punishment to limit the enormity of the regression coefficients. This gives rise to a sparse set of genes (i.e.,
biomarkers) that predict disease. This method limits the estimates of the regression coefficients towards no
correlation to the maximum likelihood estimates. Ridge regression employs a penalization term to reduce
overfitting. However, instead of using the sum of the absolute values, it uses the sum of the squares of the
coefficients. As a result, under Ridge regression, the coefficients are not zero. The Ridge function is:

The machine learning algorithm was used to check the path validity of the identified common seed genes.
[26]
[25]
[24]
To execute the regression algorithm, the NumPy and Pandas packages of the Python platform were
used. Furthermore, to overcome the difficulty of insufficient data on obese patients with breast cancer with
genotypes to train a high-performance model of risk prediction for obese patients with breast cancer, we
interpret our recommended method at 10 replicates of five-fold cross-validation. Mathematically, Ridge
[27]
regression can be defined by using a single penalty function “α” . A penalty parameter α = 0.1077 was used
in the Ridge algorithm. A high value for the penalty parameter (α) will result in a heavy penalty, leading to
the selection of fewer variables. In addition, test size and random state were taken as 0.25 and 42,
respectively.

RESULTS
Transcriptome profiling of obese patients with breast cancer
The statistical analyses of the gene expression dataset resulted in the identification of up- and
downregulated DEGs with P < 0.05 and FC > 1.5 or FC < 0.67. Nineteen downregulated and four
upregulated genes were identified. 4-Aminobutyrate aminotransferase (ABAT), beta polypeptide (ADH1B),
angiotensin II receptor type 1 (AGTR1), cyclin D1 (CCND1), dual specificity phosphatase 4 (DUSP4), flavin
containing dimethylaniline monoxygenase 2 (FMO2), FRY microtubule binding protein (FRY), polypeptide
n-acetylgalactosaminyltransferase 7 (GALNT7), glutamate ionotropic receptor AMPA type subunit 2
(GRIA2), glycogenin 2 (GYG2), interleukin-6 cytokine family signal transducer (IL6ST), keratin 6B
(KRT6B), mesoderm specific transcript (MEST), matrix metallopeptidase 12 (MMP12), matrix
metallopeptidase 9 (MMP9), phospholamban (PLN), protein kinase CAMP-dependent type II regulatory
subunit beta (PRKAR2B), ribonuclease a family member 4 (RNASE4), S100 calcium binding protein A2
(S100A2), signal peptide, CUB domain and EGF-like domain containing 2 (SCUBE2), semaphorin 3C
(SEMA3C), tissue factor pathway inhibitor (TFPI), and transforming growth factor beta receptor 3
(TGFBR3) were identified as DEGs. A small number of DEGs may have been obtained due to the study of
the effect of obesity on tumor tissues.

Biological and clinical features of seed genes
Co-expression network analyses were performed, which identified 23 DEGs of obese patients with breast
cancer samples obtained from the GSE24185 dataset [Figure 1]. Co-expressed genes were identified as

3 4 5 6 7 8 9 10 11 12 13