Page 39 - Read Online
P. 39

Fabbrini et al. Microbiome Res Rep 2023;2:25  https://dx.doi.org/10.20517/mrr.2023.25  Page 9 of 18

               is difficult, as it depends on several factors, such as the complexity of the data and the strength of the
               relationships between the variables. In general, the larger the sample size, the higher the statistical power
               and the associated odds of detecting meaningful relationships between variables. Providing a specific
               number that is universally applicable for the minimum sample size in co-occurrence networking analysis is
               difficult, as it varies depending on the specific context and research objectives. However, some studies
               suggest that a sample size of 25-30 samples per group may be considered reasonable for such kind of
               analysis [45-57] . It is important to note that the quality of the data, the accuracy of the sequencing technology
               used, and the statistical methods used to infer the network can also influence the minimum sample size
               required for a robust analysis.


               The second aspect to consider is the structure of the dataset, including the choice of using compositional
               (taxonomic) and/or functional data. Generally, in both cases, the data are present as tabular outputs
               reporting for each sample a given value (either relative abundances or counts, or normalized counts) for n
               observed features (e.g., taxa, pathways, etc.). Concerning compositional information, data can be obtained
               from: (i) 16S rRNA amplicon sequencing, often followed by QIIME 2  bioinformatic pipeline processing;
                                                                          [58]
                                                                                                       [59]
               or (ii) shotgun metagenomics sequencing, followed by read alignment tools such as MetaPhlAn 4 ,
               Kraken2  and METAnnotatorX2 , to ultimately produce the compositional table. From the functional
                                            [61]
                      [60]
                                                                                   [62]
               standpoint, inferred techniques starting from 16S rRNA data such as PICRUSt2  can be used, yet shotgun
               metagenomics is highly preferred. The reason for this is that 16S rRNA amplicon sequencing methods rely
               on the use of reference sequences to analyze small amplicons derived from metagenomes, rather than
               examining the entire metagenome as a whole. Such a limitation might result in improper assessment of
               metabolic capability and inadequate taxonomic assignment to resolve microbiome compositional data
               down to the species level. The possibilities are vast regarding tools for functional annotation of
               metagenomic samples and include both read-mapping and assembly approaches. For what concerns read-
                                                                         [63]
                                                                                   [64]
               mapping,  the  most  commonly  used  tools  include  HUMANn3 , MetaCV , EggNOG , and  other
                                                                                              [65]
               methods comprising the use of Hidden Markov Models on tailored databases. On the other hand, the use of
                                                                                                       [66]
               assembly approaches includes some tools for species-level genome bin definition, such as MetaWRAP ,
                                                                [67]
               and some tools for functional annotation, such as Prokka  and EggNOG [65,86] . The yield of metagenomics
               approaches often involves multiple layers of information such as taxonomic composition and functional
               profile, which require multi-omic integration to properly address their relationships. Multi-omics
               integration is arguably the most complex scenario and, probably because of this, receives the least coverage
               to date.

               When considering the structure of the dataset, another important aspect is the decision of whether or not to
               filter the data. Including all variables, even low-abundance ones, may provide a more comprehensive view
               of the relationships between microbial taxa or functions detected, possibly revealing previously unknown
               associations. Nevertheless, this may also lead to increased network complexity (with increased
               computational resources and runtime requirements) and might result in weaker or spurious associations
               that could fade out the real relationships. On the other hand, including only the most abundant variables
               can simplify the network, possibly highlighting the most prominent relationships between microbial taxa or
               functionalities. This approach is particularly useful for studying the composition or function of a “core”
               microbiome, focusing attention on relevant microbial taxa and functional pathways while limiting the
               computational load. Accordingly, the choice between including all variables or considering only the most
               abundant ones (e.g., taxa/functions present only in the majority of samples, filtering out zero-values) should
               be based on the research question and the availability of computational resources, also taking into account
               the related limitations [69-71] . Typically, filtering procedures reduce the complexity of microbiome data while
               providing more reproducible and comparable results in microbiome data analysis. However, studies tend to
   34   35   36   37   38   39   40   41   42   43   44