Page 67 - Read Online

P. 67

Ponsero et al. Microbiome Res Rep 2023;2:27 https://dx.doi.org/10.20517/mrr.2023.26 Page 19 of 21

Limitations and future directions
This study aimed to provide an overview of the current k-mer-based de novo comparative approaches,
evaluating their strengths and current limitations. Here, we highlight additional limitations and future
research directions that are particularly interesting, although outside the scope of this current study. In
particular, our study focused on k-mer-based tools and showed their applicability to short-read sequencing
metagenomes. Importantly, k-mers fail to distinguish between similar sequences arising from high
sequencing error rate. Error tolerance is particularly important for long reads technologies (Oxford
Nanopore Technologies or Pacific Biosciences of California sequencing platforms). To allow a deterministic
level of tolerance for base mismatches, several authors have proposed to replace k-mers with spaced
seeds [28,29] . Additionally, we focused here on tools performing comparative metagenomic tasks, but further
studies should also include additional k-mer-based tools, such as KmerGo, which capture group-specific k-
[30]
mers between groups of metagenomic sequencing datasets .

DECLARATIONS
Acknowledgments
We thank all the members of the Hurwitz Lab for fruitful discussions and their scientific support. We thank
Dr. Anne Salonen and all the members of the ‘Microbes inside’ lab for their support and discussions. We
thank the Finnish IT Center for Science and the UA High Performance Computing research center for
providing the computational resources for this project.

Authors’ contributions
Designed the research: Ponsero AJ, Hurwitz BL
Analyzed the data: Ponsero AJ, Miller M
Interpreted the results: Ponsero AJ, Hurwitz BL
Wrote the manuscript: Ponsero AJ, Miller M, Hurwitz BL

Availability of data and materials
The compositions of each sample of the simulated datasets are available in Supplementary File 1. The
taxonomic classification and NCBI ID of the genomes used are available in Supplementary File 2. The
pipeline used to generate the mock metagenomes is available on GitHub at: https://github.com/hurwitzlab/
Simulated_metagenomes_generator. The list of publicly available metagenomic samples included in the
benchmark is available in Supplementary File 3. The code used to run tools is available on GitHub at: https:/
/github.com/mattmiller899/de_novo_metagenomic_tools. The parameters used to run each tool for the
benchmark experiment is listed in Supplementary Table 2.

Financial support and sponsorship
This work was supported by grants from the Academy of Finland (339172 to AP) and Gordon and Betty
Moore Foundation (GBMF 8751 to BH).

Conflicts of interest
BH holds concurrent appointments as an Associate Professor of Biosystems Engineering at the University of
Arizona and as an Amazon Scholar. This publication describes work performed at the University of Arizona
and is not associated with Amazon.
The remaining authors declare that the research was conducted in the absence of any commercial or
financial relationships that could be construed as a potential conflict of interest.

62 63 64 65 66 67 68 69 70 71 72