Page 54 - Read Online
P. 54

Page 6 of 21                Ponsero et al. Microbiome Res Rep 2023;2:27  https://dx.doi.org/10.20517/mrr.2023.26

                                                                                              [15]
               Taxonomic profiling of the metagenomic samples was performed using Kraken2 v2.1.1  against the
                                                                               [16]
                               [19]
               HumGut database , and Bracken v2.6.1 was run on Kraken2 outputs . PCoA visualization of the
               distances computed between sample pairs was generated using the ecodist R package v2.0.9.
               Before hierarchical clustering of the samples, low-abundance species (< 0.01% relative abundance and
               < 0.1% prevalence) were filtered out. Then, the dataset was transformed into relative abundances, and a
               distance matrix was calculated from the transformed data using the Bray-Curtis or presence/absence using
               the Ecodist function. Hierarchical clustering was done with the function hclust and with the Wald.D2
               method. Clusters’ purity was calculated as follows: (1) each cluster was assigned to the sample group, which
               is most frequent in the cluster; (2) the accuracy of this assignment was measured by counting the number of
               correctly assigned samples; and (3) dividing the accuracy by the total number of samples.


               PERMANOVA testing was performed using the adonis2 function from the vegan R package using 999
               permutations.


               RESULTS
               Comparing k-mer-based and taxonomy-based analysis
               To assess and compare beta-diversity distances obtained using Reference-based and k-mer-based
               approaches, four simulated short-reads metagenomic datasets were generated. Each dataset was composed
               of 100 metagenomes, and each sample had a known taxonomic composition and relative abundance profile.
               Pairwise beta-diversity metrics were computed between all pairs of samples in the dataset using the true
               taxonomic profile at the species level and is referred to as the “expected taxonomic-based” beta-diversity
               metric. Using the generated sample taxonomic composition and profiles, simulated metagenomic reads
               were generated with a given sequencing depth and sequencing error model. The k-mer-based beta-diversity
               distances between each pair of simulated metagenomes were assessed using Simka  and are referred to as
                                                                                     [10]
               “k-mer-based” beta-diversity metrics. Finally, the simulated metagenomes were profiled using the read
               classifier Kraken2 and Bracken. The read counts obtained were used to compute a “read-based taxonomic”
               beta-diversity metric at the species level. It is important to note that because all genomes used to generate
               the mock communities are present in the Kraken2 database, the impact of unknown taxa in metagenomes is
               not investigated in this experiment. The correlation between the beta-diversity metrics for the same sample
               pairs was measured using a Spearman correlation. Figure 1 provides an overview of the simulated
               experiment.

               Technical effects
               We first evaluated the correlation between taxonomic-based beta-diversity and k-mer-based metrics in
               simple simulated metagenomes and assessed the potential impact of technical variables such as sequencing
               technology and sequencing depth. A simulated dataset (SimSet 1) of 100 simulated metagenomes composed
               of 25 bacterial species was generated for three different sequencing technologies (HiSeq, MiSeq, and
               NovaSeq) and at different sequencing depths (50K, 100K, 500K, 1M, 5M, 10M, and 50M paired reads). The
               “expected taxonomic-based” beta-diversity distances (Bray-Curtis and presence/absence Jaccard distance)
               were computed at the species level between each pair of samples using the true taxonomic profiles used to
               generate the simulated metagenomes. The same beta-diversity distances were computed on the simulated
                                                                                                    [10]
               metagenomes’ k-mer composition using Simka at different k-mer lengths (10, 15, 20, 25, and 30) . The
               correlation between expected taxonomic and k-mer-based beta-diversity distances was assessed for each
               setting using Spearman correlations.
   49   50   51   52   53   54   55   56   57   58   59