Page 36 - Read Online
P. 36

Page 6 of 18                Fabbrini et al. Microbiome Res Rep 2023;2:25  https://dx.doi.org/10.20517/mrr.2023.25

               Table 1 lists the most common tools used for microbiome networking and described hereafter. Please use
               the "source" information in this table and refer to the developer's feature description for a clear and
               thorough explanation of those tools.


               Correlation methods
               To cope with the aforementioned challenges, several approaches have been developed to estimate
                                                                                               [38]
               correlation or covariance matrices in case of compositional constraints. For example, SparCC  estimates
               linear Pearson correlations, but considering log-transformed components, approximating the correlation
               coefficients assumes that the number of components is large, and that the correlation network is sparse.
               CCLasso  has been developed to address the limitations of SparCC, namely the approximate assumptions
                       [39]
               and resulting accuracy. The tool makes use of log-ratio transformed abundances as well but implements a
               latent variable model with L1-norm shrinkage method (also known as ‘LASSO’). This solves the constant
               sum constraint problem, which refers to the requirement that the proportions or abundances of different
               components within a sample must sum up to a constant value (usually 1 or 100). In the L1-norm shrinkage
               method, the goal is to estimate the coefficients of a linear regression model while simultaneously performing
               variable selection by imposing a penalty term on the absolute values of the coefficients. This penalty term
               encourages some coefficients to shrink towards zero, effectively performing variable selection and reducing
               the impact of irrelevant variables, potentially overcoming the constant sum constraint problem and yielding
               meaningful results in the analysis of microbiome data. While CCLasso performs better than SparCC, it has
               similar difficulties common to all networking correlation methods, mainly the inability to detect nonlinear
               relationships among taxa.


               Nonetheless, among the possibilities to tackle the problems inherent to microbiome data, custom multiple
               comparison adjustment and strict threshold might be applied to correlation approaches to derive
               correlation matrices with significant correlations, representative of the interactions between taxa, which can
               be used for network construction . In addition, easy-to-use though less precise methods are available, such
                                           [40]
               as the Cytoscape app CoNet [41,42] . The main strength of such an app is the possibility of computing a number
               of different correlations, similarities or dissimilarities, to score the association strength between taxa, all
               within one of the most used platforms for network visualization, along with esyN .
                                                                                   [43]

               These methods can be generally referred to as co-occurrence networking, where a network is constructed
               representing microbial variables (taxa) as nodes, and their co-occurrence or co-exclusion associative
               relationships as edges. Yet, this approach may miss causal relationships.


               Graphical models
               Both correlation methods and graphical models are used to analyze the relationships between variables in a
               dataset, but they differ in approach and assumptions. Correlation methods assume that relationships
               between variables are linear and do not account for nonlinear relationships or other types of dependencies,
               while graphical models provide a way to represent conditional dependencies to obtain sparse networks
               reflecting direct relationships. Graphical models are typically constructed using probabilistic models such as
               Bayesian networks or Markov random fields, representing the probability distribution of the data and the
               relationships between variables as a graph, where the nodes depict the variables (in the microbiome field,
               taxa or functions) and the edges represent conditional dependencies between such variables. The use of
               probability theory to model the relationships between variables is one of the main advantages of graphical
               models, allowing for the estimation of causal relationships, including nonlinear relationships. To date,
               graphical models appear to be the best option for evaluating microbiome properties via networking
               approaches.
   31   32   33   34   35   36   37   38   39   40   41