Page 55 - Read Online

P. 55

Ponsero et al. Microbiome Res Rep 2023;2:27 https://dx.doi.org/10.20517/mrr.2023.26 Page 7 of 21

Figure 1. Overview of simulated experiments. Simulated metagenomic reads were generated using InSilicoSeq. The k-mer spectra were
obtained using Simka and read-based profiles using Kraken2 and Bracken.

On simple communities of only 25 organisms, the expected taxonomic and k-mer-based Bray-Curtis
distances are overall well correlated (rho estimate > 0.75 in most tested conditions) [Figure 2A]. The
correlation is linear [Figure 2B], and both Spearman and Pearson correlations give consistent results (not
shown). The correlation between expected taxonomic and k-mer-based Bray-Curtis distances is affected by
both the k-mer size and sequencing depth, with the strongest correlations observed for a k-mer size above
20bp and a sequencing depth above 1 million reads [Figure 2A]. On the other hand, the sequencing
technology had only a minimal impact on the observed correlations [Supplementary Figure 1].

The correlations between expected taxonomic and k-mer-based presence/absence Jaccard distances were
globally poor, with a rho estimate below 0.5 in most tested conditions [Figure 2C and D]. Similar to the
results for the Bray-Curtis distances, longer k-mer sizes (> 15bp) and higher sequencing depth (> 1M reads)
improved the correlations with the expected Jaccard distances, while the choice of sequencing technologies
only had a minimal impact [Supplementary Figure 1].

Notably, in all tested conditions, the correlations between expected taxonomic and k-mer-based distances
were poor when considering shallow sequencing depth below 1M reads. Given the simple composition of
the mock communities, composed of only 25 organisms each, read-based classifiers such as Kraken2 allows
for a complete description of the total community richness even at the shallowest sequencing depth (50k
reads). However, k-mer-based beta-diversity distances computed on shallow datasets are overestimated,
with most samples-to-samples k-mer-based distances close or equal to 1 [Supplementary Figure 2].

50 51 52 53 54 55 56 57 58 59 60