Page 63 - Read Online
P. 63

Ponsero et al. Microbiome Res Rep 2023;2:27  https://dx.doi.org/10.20517/mrr.2023.26  Page 15 of 21




































                Figure 8. Comparison of taxonomic and k-mer-based approaches on a small dataset of infant and maternal fecal metagenomes.
                (A) Average composition of the samples grouped by sample origin at the Family level, taxonomic families with a prevalence below 10%
                and a relative abundance below 5% were grouped as “Other”; (B) PcoA of the samples on the taxonomic profiles at the species level
                using a Bray-Curtis distance; (C) PcoA of the samples on the taxonomic profiles at the species level using a presence/absence Jaccard
                distance; (D) PcoA of the samples on the k-mer spectra using a Bray-Curtis distance (E) PcoA of the samples on k-mer spectra profiles
                using a presence/absence Jaccard distance. VD infants: Vaginally delivered infants.

               We next compared the k-mer-based tools on a clustering task using a real metagenomic dataset of 30
               metagenomes from 3-week-old infant and adult fecal samples. The samples’ taxonomic profiles were
               obtained using a read classifier, and the dataset was visualized using a PcoA on Bray-Curtis or presence/
               absence Jaccard. At the taxonomic level, the dataset was composed of three distinct sample clusters, mother
               samples, infants born by C-Section, and infants born vaginally [Figure 8A]. Hierarchical clustering was
               performed on the computed distances using a ward linkage method, and the purity of the obtained clusters
               was calculated. The taxonomic Bray-Curtis distance allowed for a clear separation between the three types
               of samples (cluster purity = 1), while the presence/absence Jaccard distance separated only infants from
               mother samples but did not allow for a clear separation of the samples according to delivery mode (cluster
               purity = 0.67) [Figure 8B and C]. K-mer-based distances were computed for these samples using Simka,
               SimkaMin, Mash, HULK, Metafast, kWIP, and SourMash using the same k-mer size (k = 31bp). With
               complete k-mer spectra, using Simka, the data structure observed was well conserved, and samples were
               clearly separated as expected (cluster purity = 0.97 for Bray-Curtis, cluster purity = 0.9 for presence/absence
               Jaccard) [Figure 8D and E]. Using the default parameters settings, most tools were able to cluster the
               samples as expected (cluster purity > 0.8), with the exception of Sourmash (cluster purity = 0), as the default
               sketch size parameters were too small to allow for a correct approximation of the sample’s distances.
               Additionally, CAFE was not able to recapitulate the expected data structure using the Cosine or D2Star
               distance metric and a k-mer size of 5pb (cluster purity < 0.5 for all conditions tested) [Supplementary Figure
               8]. The cluster purity metrics obtained for all tools are available in Supplementary Table 1.
   58   59   60   61   62   63   64   65   66   67   68