Page 7 - Read Online
P. 7

Comertpay et al. J Transl Genet Genom 2022;6:84-94  https://dx.doi.org/10.20517/jtgg.2021.44  Page 86

               Although various studies are being conducted to gain a better understanding of the association between
               obesity and breast cancer, integrative analysis is needed to detect novel molecular signatures and pathways
               to determine the obesity related breast cancer risk biomarkers.


               In the present study, a gene expression dataset was analyzed to compare obesity-associated breast cancer
               samples and non-obesity-associated with breast cancer samples. The co-expression network and protein-
               protein interaction (PPI) network of differentially expressed genes (DEGs) were determined. Seed genes,
               common DEGs, were then identified from the co-expression gene network and hub genes of the PPI
               network. Next, to examine the molecular mechanisms of obesity-associated breast cancer, statistically
               significant pathways were determined. The Ridge penalty regression model was executed by using p-values
               of enriched pathways and seed gene pathway association score to determine the potential to be a molecular
               signature of seed genes in obese patients with breast cancer to obtain the most relevant molecular
               signatures. Finally, we identified several candidate genes and pathways in obese patients with breast cancer.

               METHODS
               Gene expression datasets and identification of differentially expressed genes
               To characterize gene expression profiles of obesity in breast cancer, raw data of the obesity-related high-
                                                       [15]
               throughput gene expression dataset GSE24185  in breast cancer were obtained from the Gene Expression
                       [16]
               Omnibus . In total, 74 samples were analyzed, including those from 36 historically normal (BMI ≤ 24.9)
               breast cancer patients as a control sample and 38 obese patients with breast cancer (BMI ≥ 30). The affy
               package of the R/Bioconductor platform (version 3.6) was used. Normalization for each dataset was
               performed with robust multiarray  techniques. Normalized log-expression values, which were calculated
                                            [17]
                                                                         [18]
               using multiple test options of linear models for microarray data  to define DEGs, were used in the
               statistical analysis of each dataset to contrast obese vs. non-obese breast cancer patients. For DEGs
               identification, they were selected according to computed P-values greater than the significance level (P value
               < 0.05) with the fold change of 1.5 used as statistical threshold parameters.


               Construction of co-expression networks in breast cancer and obese states
               By separating the expression profiles of non-obesity-associated and obesity-associated breast cancer
               samples, two new data subsets were generated using the expression profiles of resultant DEGs. The co-
               expression network of DEGs was reconstructed by calculating the Pearson correlation coefficients of the
               mean expression values of DEGs in samples from obese patients with breast cancer and non-obese patients
               with breast cancer. To specify the statistical meaning of binary gene correlations, the obtained correlation
               coefficients were normally distributed (P-value < 0.05), and positive and negative correlation cutoff
               significance levels (cutoffs > 0.47 and ≤ 0.47) were selected, respectively. An obesity-associated breast
               cancer-specific co-expression network was reconstructed, including 15 nodes and 17 edges, by using
               significant pairwise gene correlations.

               PPI network reconstruction and identification of seed genes
               The physical protein-protein interaction information was obtained from the BioGRID  database, which
                                                                                          [19]
               includes 43,219 physical interactions associated with proteins. Resultant DEGs of PPI networks were
               reconstructed using Cytoscape . Seed genes were obtained from the intersection of DEGs, co-expressed
                                          [20]
               genes, and hub genes of the PPI network.

               Gene set overrepresentation analyses
                                                                          [21]
               Overrepresentation analyses were built using the ConsensusPathDB  bioinformatics tool to determine
               biological processes, molecular functions, metabolic pathways, and signaling information crucially
               associated with DEGs of obese patients with breast cancer and seed genes. The Kyoto Encyclopedia of
   2   3   4   5   6   7   8   9   10   11   12