Page 49 - Read Online
P. 49
Ponsero et al. Microbiome Res Rep 2023;2:27 Microbiome Research
DOI: 10.20517/mrr.2023.26
Reports
Original Article Open Access
Comparison of k-mer-based de novo comparative
metagenomic tools and approaches
Alise Jany Ponsero 1,2,3 , Matthew Miller 2 , Bonnie Louise Hurwitz 2,3
1
Human Microbiome Research Program, University of Helsinki, Helsinki 00290, Finland.
2
Department of Biosystems Engineering, The University of Arizona, Tucson, AZ 85721, USA.
3
BIO5 Institute, The University of Arizona, Tucson, AZ 85721, USA.
Correspondence to: Dr. Alise Jany Ponsero, Human Microbiome Research Program, Faculty of Medicine, University of Helsinki,
Haartmaninkatu 3, Helsinki 00290, Finland. E-mail: alise.ponsero@helsinki.fi; Dr. Bonnie Louise Hurwitz, Department of
Biosystems Engineering, The University of Arizona, 1657 East Helen Street, AZ 85721, USA. E-mail: bhurwitz@arizona.edu
How to cite this article: Ponsero AJ, Miller M, Hurwitz BL. Comparison of k-mer-based de novo comparative metagenomic tools
and approaches. Microbiome Res Rep 2023;2:27. https://dx.doi.org/10.20517/mrr.2023.26
Received: 4 Apr 2023 First decision: 31 May 2023 Revised: 28 Jun 2023 Accepted: 12 Jul 2023 Published: 20 Jul 2023
Academic Editor: Leonardo Mancabelli Copy Editor: Dong-Li Li Production Editor: Dong-Li Li
Abstract
Aim: Comparative metagenomic analysis requires measuring a pairwise similarity between metagenomes in the
dataset. Reference-based methods that compute a beta-diversity distance between two metagenomes are highly
dependent on the quality and completeness of the reference database, and their application on less studied
microbiota can be challenging. On the other hand, de-novo comparative metagenomic methods only rely on the
sequence composition of metagenomes to compare datasets. While each one of these approaches has its
strengths and limitations, their comparison is currently limited.
Methods: We developed sets of simulated short-reads metagenomes to (1) compare k-mer-based and taxonomy-
based distances and evaluate the impact of technical and biological variables on these metrics and (2) evaluate the
effect of k-mer sketching and filtering. We used a real-world metagenomic dataset to provide an overview of the
currently available tools for de novo metagenomic comparative analysis.
Results: Using simulated metagenomes of known composition and controlled error rate, we showed that k-mer-
based distance metrics were well correlated to the taxonomic distance metric for quantitative Beta-diversity
metrics, but the correlation was low for presence/absence distances. The community complexity in terms of taxa
richness and the sequencing depth significantly affected the quality of the k-mer-based distances, while the impact
© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0
International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing,
adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as
long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and
indicate if changes were made.
www.oaepublish.com/mrr