Page 38 - Read Online
P. 38

Page 8 of 18                Fabbrini et al. Microbiome Res Rep 2023;2:25  https://dx.doi.org/10.20517/mrr.2023.25

               In addition to the tools mentioned above, there are others that allow for the estimation of separate networks
               for groups defined by a binary variable, allowing the differences between each network to be recovered,
               while also providing interval estimates for each parameter and evaluating the impact of the covariates on the
                                                              [50]
               network properties. Examples of such tools are MDiNE , which makes use of a Bayesian graphical model
               fit with Markov Chain Monte Carlo (MCMC) methods, and NetCoMi , an all-around tool for single and
                                                                           [51]
               differential network construction, analysis and comparison that encloses most of the aforementioned
               methods (e.g., Pearson correlation, Spearman correlation, SparCC, CCLasso, SPIEC-EASI, SPRING) as well
               as association and dissimilarity methods, combined in a modular and supervised fashion.

               Multiomics data integration
               Networking analysis has thus emerged as a powerful approach for modeling microbiome data, oftentimes
               by integrating these data with other omics data to evaluate functional linkages. Microbiome multi-omics
               requires collecting multiple sorts of high-dimensional biological data, including those from amplicon (e.g.,
               16S rRNA) sequencing, shotgun metagenomics, metatranscriptomics, metabolomics, etc., from a
               microbiome sample and its environment or host. This kind of integration holds the potential to resolve
               functional mechanisms of the microbiome ; consequently, tools and methods have been produced to
                                                     [52]
               address these procedures.

               Multi-omics integration mostly exploits correlation-based methods, such as the Patient Similarity Networks
               (PSN) and Weighted Gene Correlation Network Analysis (WGCNA) , and dimension reduction methods
                                                                          [53]
               such as Principal Component Analysis (PCA), Partial Least Squares regression (PLS) or Co-inertia Analysis
               (CIA). Dimension reduction techniques aim to reduce the high dimensionality of multi-omics datasets
               while preserving as much relevant information as possible. By reducing dimensionality, these methods
               facilitate the visualization, interpretation, and analysis of integrated multi-omics data. Canonical correlation
               analysis can identify linear relationships between multi-omics datasets by finding the canonical variates that
               maximize the correlation between datasets. It is often used to reveal shared biological signals across different
               omics layers. Network-based integration, on the other hand, combines multi-omics data by constructing
               and analyzing molecular networks. Network-based methods utilize graph theory and network analysis
               techniques to identify modules or communities of interconnected genes, proteins, or metabolites that are
               functionally related. Packages providing this type of analysis have been released and allow for easy
                                                                                                       [56]
               implementation of such approaches. Examples include DIABLO -part of MixOmics -and MiBiOmics ,
                                                                                       [55]
                                                                     [54]
               both available as R packages.
               In recent years, machine learning-based integration has become increasingly relevant in data science,
               including multi-omics data integration. Machine learning algorithms, such as random forests, support
               vector machines, or deep learning models, can be used to integrate and analyze multi-omics data. These
               algorithms can capture complex relationships and patterns across multiple omics layers, enabling predictive
               modeling or classification tasks.

               ASPECTS TO CONSIDER WHEN CONSTRUCTING NETWORKS FROM MICROBIOME
               DATA
               The first aspect to consider before starting a networking analysis using microbiome data is the sample size.
               Sample size in network analysis refers to the number of individual entities (i.e., nodes representing variables
               or taxa) for which data are available. In network analysis, the sample size can be determined based on
               various considerations, including the number of samples that will be included in the smallest network, to
               ensure that even the smallest network computed in the study is going to be statistically robust and reliable.
               Determining a priori the minimum sample size required for co-occurrence microbiome networking analysis
   33   34   35   36   37   38   39   40   41   42   43