Page 64 - Read Online
P. 64

Page 16 of 21               Ponsero et al. Microbiome Res Rep 2023;2:27  https://dx.doi.org/10.20517/mrr.2023.26






































                Figure 9. Comparison of taxonomic and k-mer-based approaches on a large dataset of infant fecal metagenomes. (A) Average
                composition of the samples grouped by sample origin at the Family level, taxonomic families with a prevalence below 10% and a relative
                abundance below 5% were grouped as “Other”; (B) PcoA of the samples on the taxonomic profiles at the species level using a Bray-
                Curtis distance; (C) PcoA of the samples on the taxonomic profiles at the species level using a presence/absence Jaccard distance;
                (D) PcoA of the samples on the k-mer spectra using a Bray-Curtis distance (E) PcoA of the samples on k-mer spectra profiles using a
                presence/absence Jaccard distance.


               Finally, we compared the tools’ distances obtained on a large dataset of 224 samples from infant fecal
               microbiota sampled at 3 weeks, 6 months, and 12 months of age. These fecal samples were selected as they
               enable the assessment of the tools on a gradient of microbiota change [Figure 9A]. The tool’s ability to
               recapitulate the gradient data structure was assessed using PCoA visualizations and PERMANOVA testing.
               Using taxonomic annotation, both Bray-Curtis and presence/absence Jaccard distances were able to
               distinguish the sample time points (PERMANOVA P < 0.001, permutations = 999) [Figure 9B and C]. k-
               mer-based distances were computed for these samples as previously, with the exception of Commet, which
               could not scale to this larger dataset size. As observed in the small benchmark, the data structure observed
               using k-mer-based Bray-Curtis or presence/absence Jaccard distances recapitulates well the taxonomic data
               structure [Figure 9D and E]. Using the default parameters settings, most tools were able to separate the
               samples by age (PERMANOVA P < 0.001, permutations = 999), except for SimkaMin using the presence/
               absence Jaccard distance, whose default sketch size was not appropriate for this dataset. Additionally, CAFE
               was not able to recapitulate the expected data structure for all tested distances using a k-mer size of 5bp
               (PERMANOVA P > 0.001, permutations = 999) [Supplementary Figure 9].

               DISCUSSION
               A central task in the analysis of metagenomic samples is the ability to compare microbial communities from
               different samples. Comparative metagenomics analysis typically includes measuring a distance, often an
               ecological beta-diversity distance between pairs of metagenomes, and the resulting distance matrix can be
               used for various tasks, such as visualization, clustering, or retrieval. De novo comparative metagenomic
   59   60   61   62   63   64   65   66   67   68   69