Page 53 - Read Online
P. 53

Ponsero et al. Microbiome Res Rep 2023;2:27  https://dx.doi.org/10.20517/mrr.2023.26  Page 5 of 21

               Beta-diversity distances between simulated metagenomes
               Three distinct types of distance metrics were computed on the simulated metagenomes:


               The expected taxonomic beta-diversity distances (Bray-Curtis and presence/absence Jaccard distances) were
               computed on the simulated samples’ taxonomic abundance profiles using the Vegan R package .
                                                                                               [14]
                                                                                       [16]
                                                                         [15]
               Read-based  taxonomic  profiles  were  obtained  using  Kraken2   and  Bracken   on  the  simulated
               metagenomes using the “Standard plus protozoa & fungi database” (from https://benlangmead.github.io/
               aws-indexes/k2 on 05.2021). The read-based taxonomic beta-diversity distances (Bray-Curtis and presence/
               absence Jaccard distances) were computed on the simulated samples’ taxonomic abundance profiles using
               the Vegan R package.


               k-mer-based beta-diversity distances were computed using Simka (Bray-Curtis and presence/absence
               Jaccard distances), with controlled k-mer length. The minimum abundance k-mer filter was set to 2 and the
               maximum abundance k-mer filter to 999999999 .
                                                       [10]
               Spearman correlations between the different types of beta-diversity distances were assessed using the Stats R
               package v3.6.2.


               Effect of sketched k-mer distances
               A simulated dataset of 100 simulated metagenomes composed of 25 organisms each was generated using
               InSilicoSeq for a sequencing depth of 5 million reads and with the HiSeq error model. The exact k-mer-
               based Bray-Curtis and presence/absence Jaccard distances were obtained for determined k-mer lengths
               using Simka with the default filtering parameter. Sketched k-mer profiles and distances were obtained using
               SimkaMin  at determined k-mer and sketch sizes.
                        [17]
               The absolute difference between the exact and sketched k-mer distance was calculated for each sample pair
               comparison. The correlation between the expected Bray-Curtis distances on the simulated taxonomic
               profiles and the sketched k-mer-based distances was calculated using a Spearman correlation.

               Minimum and maximum abundance k-mer filter effects
               For this experiment, a simulated dataset of 100 simulated metagenomes composed of 25 organisms each was
               generated using InSilicoSeq for a sequencing depth of 5 million reads and using an HiSeq error model. K-
               mer-based Bray-Curtis and presence/absence Jaccard distances were obtained for a determined k-mer
               length using Simka without a k-mer filter. Distances also were computed on the same simulated
               metagenome dataset using the minimum k-mer abundance or maximum k-mer abundance parameter from
               Simka. The absolute difference between the unfiltered and filtered k-mer distance was calculated for each
               sample pair comparison. The correlation between the expected Bray-Curtis distances on the simulated
               taxonomic profiles and the filtered k-mer-based distances was calculated using a Spearman correlation.


               Benchmark on infant and mother metagenomic dataset
               Publicly available fecal metagenomes from infants and pregnant mothers were retrieved from the European
               Nucleotide Archive (ENA Bioproject ID: PRJEB52774). The sample collection and sequencing are described
                                          [18]
               in a previously published study . Sequences were trimmed and quality filtered using FastQC v0.11.9 and
               Trim Galore v0.6.6 with default parameters. Quality-filtered sequences were screened to remove human
               read sequences using Bowtie2 v2.4.2 against the Human genome (Human Build 38, patch release 7). After
               quality control and human read filtering, infant fecal metagenomes containing less than 10 million paired-
               end reads and mother fecal metagenomes with less than 20 million paired-end reads were discarded.
   48   49   50   51   52   53   54   55   56   57   58