Page 40 - Read Online
P. 40

Page 10 of 18               Fabbrini et al. Microbiome Res Rep 2023;2:25  https://dx.doi.org/10.20517/mrr.2023.25

               use in-house arbitrary thresholds and no comparative studies on this topic are available to date. Given that
               the choice may affect the biological interpretation of the results, proceeding in both ways (i.e., including
               everything and filtering something out) is not to be excluded. In this regard, providing the deepest and most
               accurate data possible is crucial for obtaining sound results. Consequently, deep shotgun metagenomics is
               generally preferred over 16S rRNA amplicon sequencing for the generation of compositional data for
               networking analysis.

               Once eventually obtained tables of filtered data representing compositional or functional aspects of the
               microbiome, this can generally be fed directly to the previously reported tools to retrieve the edge list table,
               representing the pairwise connection between all nodes (microbial variables) with a corresponding strength
               for the connection computed according to the model chosen.


               CASE STUDY
               In order to highlight the potential of networking analysis with microbiome data, we gathered shotgun
               metagenomic data from a recent work by Thomas et al. on a total of 201 subjects divided into 85 colorectal
                                                                         [72]
               cancer (CRC) patients and 116 age-matched healthy controls (HC) , with negative colonoscopy and no
               relevant gastrointestinal disorders [Supplementary Data 1]. The choice of this work is based on the high
               sequencing depth of the raw sequences produced by the authors and the simple clustering of the samples in
                                                                                                  [72]
               the dataset (i.e., CRC vs. HC). We decided to subsample the cohorts documented by Thomas et al. , using
               a group sample size that is likely to be the most used by other research groups at this time. Quality- and
               human-filtered sequences showed an average depth of 9.94 Gb (± 0.31 SEM). We decided to limit the focus
               of this case study to species-level compositional networks in order to provide an example of a simple
               procedure, as detailed in Figure 3. The proposed procedure could be exploited by less experienced readers
               as well, without the need for complex and resource-demanding functional annotation pipelines. Sequences
                                           [59]
               were processed via MetaPhlAn 4  allowing for unclassified estimation, using the latest database available
               (vJan21-202103), and the analysis required around 3 TB of storage, max 100 GB of RAM and less than 3
               days using 20 threads on an Intel Xeon Platinum 8260 Processor server. Compositional tables were then
               merged and processed through a coupled local-to-global networking analysis using NetCoMi with the
               SPRING method and adaptive Benjamini-Hochberg method for multiple test adjustments. The local
               networking approach consists in computing a separate network for each study group, so that pairwise
               network comparison techniques can then be used; the global networking approach, on the other hand,
               requires the construction of a single inferred network considering all the samples together and allows for the
               reconstruction of all the possible interactions in the dataset. The focus, in this case, is on identifying
               interaction modules and evaluating how such modules are populated by each group in terms of the
               overabundance of microbial components.

               In order to compare the results obtained from standard approaches focusing on relative abundance and
               networking workflows, we first investigated the significant differences in relative abundance at the species
               level. To do so, Wilcoxon rank sum tests with false discovery rate (FDR) multiple comparison correction
               were used, and the results confirmed what was previously reported by Thomas et al. : CRC patients
                                                                                           [72]
               showed higher relative abundances of the species Clostridium symbiosum, Fusobacterium nucleatum,
               Gemella morbillorum, Peptostreptococcus stomatis, Porphyromonas somerae, Prevotella intermedia, and
               Parvimonas micra, while for HC we detected higher proportions of Roseburia intestinalis [Supplementary
               Figure 1]. Given the subsampling of the original cohorts , the concordance of the compositional findings
                                                               [69]
               with the original study is considered satisfactory.
   35   36   37   38   39   40   41   42   43   44   45