Page 36 - Read Online
P. 36
Page 6 of 18 Fabbrini et al. Microbiome Res Rep 2023;2:25 https://dx.doi.org/10.20517/mrr.2023.25
Table 1 lists the most common tools used for microbiome networking and described hereafter. Please use
the "source" information in this table and refer to the developer's feature description for a clear and
thorough explanation of those tools.
Correlation methods
To cope with the aforementioned challenges, several approaches have been developed to estimate
[38]
correlation or covariance matrices in case of compositional constraints. For example, SparCC estimates
linear Pearson correlations, but considering log-transformed components, approximating the correlation
coefficients assumes that the number of components is large, and that the correlation network is sparse.
CCLasso has been developed to address the limitations of SparCC, namely the approximate assumptions
[39]
and resulting accuracy. The tool makes use of log-ratio transformed abundances as well but implements a
latent variable model with L1-norm shrinkage method (also known as ‘LASSO’). This solves the constant
sum constraint problem, which refers to the requirement that the proportions or abundances of different
components within a sample must sum up to a constant value (usually 1 or 100). In the L1-norm shrinkage
method, the goal is to estimate the coefficients of a linear regression model while simultaneously performing
variable selection by imposing a penalty term on the absolute values of the coefficients. This penalty term
encourages some coefficients to shrink towards zero, effectively performing variable selection and reducing
the impact of irrelevant variables, potentially overcoming the constant sum constraint problem and yielding
meaningful results in the analysis of microbiome data. While CCLasso performs better than SparCC, it has
similar difficulties common to all networking correlation methods, mainly the inability to detect nonlinear
relationships among taxa.
Nonetheless, among the possibilities to tackle the problems inherent to microbiome data, custom multiple
comparison adjustment and strict threshold might be applied to correlation approaches to derive
correlation matrices with significant correlations, representative of the interactions between taxa, which can
be used for network construction . In addition, easy-to-use though less precise methods are available, such
[40]
as the Cytoscape app CoNet [41,42] . The main strength of such an app is the possibility of computing a number
of different correlations, similarities or dissimilarities, to score the association strength between taxa, all
within one of the most used platforms for network visualization, along with esyN .
[43]
These methods can be generally referred to as co-occurrence networking, where a network is constructed
representing microbial variables (taxa) as nodes, and their co-occurrence or co-exclusion associative
relationships as edges. Yet, this approach may miss causal relationships.
Graphical models
Both correlation methods and graphical models are used to analyze the relationships between variables in a
dataset, but they differ in approach and assumptions. Correlation methods assume that relationships
between variables are linear and do not account for nonlinear relationships or other types of dependencies,
while graphical models provide a way to represent conditional dependencies to obtain sparse networks
reflecting direct relationships. Graphical models are typically constructed using probabilistic models such as
Bayesian networks or Markov random fields, representing the probability distribution of the data and the
relationships between variables as a graph, where the nodes depict the variables (in the microbiome field,
taxa or functions) and the edges represent conditional dependencies between such variables. The use of
probability theory to model the relationships between variables is one of the main advantages of graphical
models, allowing for the estimation of causal relationships, including nonlinear relationships. To date,
graphical models appear to be the best option for evaluating microbiome properties via networking
approaches.