Page 67 - Read Online
P. 67

Ponsero et al. Microbiome Res Rep 2023;2:27  https://dx.doi.org/10.20517/mrr.2023.26  Page 19 of 21


               Limitations and future directions
               This study aimed to provide an overview of the current k-mer-based de novo comparative approaches,
               evaluating their strengths and current limitations. Here, we highlight additional limitations and future
               research directions that are particularly interesting, although outside the scope of this current study. In
               particular, our study focused on k-mer-based tools and showed their applicability to short-read sequencing
               metagenomes. Importantly, k-mers fail to distinguish between similar sequences arising from high
               sequencing error rate. Error tolerance is particularly important for long reads technologies (Oxford
               Nanopore Technologies or Pacific Biosciences of California sequencing platforms). To allow a deterministic
               level of tolerance for base mismatches, several authors have proposed to replace k-mers with spaced
               seeds [28,29] . Additionally, we focused here on tools performing comparative metagenomic tasks, but further
               studies should also include additional k-mer-based tools, such as KmerGo, which capture group-specific k-
                                                                [30]
               mers between groups of metagenomic sequencing datasets .

               DECLARATIONS
               Acknowledgments
               We thank all the members of the Hurwitz Lab for fruitful discussions and their scientific support. We thank
               Dr. Anne Salonen and all the members of the ‘Microbes inside’ lab for their support and discussions. We
               thank the Finnish IT Center for Science and the UA High Performance Computing research center for
               providing the computational resources for this project.

               Authors’ contributions
               Designed the research: Ponsero AJ, Hurwitz BL
               Analyzed the data: Ponsero AJ, Miller M
               Interpreted the results: Ponsero AJ, Hurwitz BL
               Wrote the manuscript: Ponsero AJ, Miller M, Hurwitz BL

               Availability of data and materials
               The compositions of each sample of the simulated datasets are available in Supplementary File 1. The
               taxonomic classification and NCBI ID of the genomes used are available in Supplementary File 2. The
               pipeline used to generate the mock metagenomes is available on GitHub at: https://github.com/hurwitzlab/
               Simulated_metagenomes_generator. The list of publicly available metagenomic samples included in the
               benchmark is available in Supplementary File 3. The code used to run tools is available on GitHub at: https:/
               /github.com/mattmiller899/de_novo_metagenomic_tools. The parameters used to run each tool for the
               benchmark experiment is listed in Supplementary Table 2.


               Financial support and sponsorship
               This work was supported by grants from the Academy of Finland (339172 to AP) and Gordon and Betty
               Moore Foundation (GBMF 8751 to BH).


               Conflicts of interest
               BH holds concurrent appointments as an Associate Professor of Biosystems Engineering at the University of
               Arizona and as an Amazon Scholar. This publication describes work performed at the University of Arizona
               and is not associated with Amazon.
               The remaining authors declare that the research was conducted in the absence of any commercial or
               financial relationships that could be construed as a potential conflict of interest.
   62   63   64   65   66   67   68   69   70   71   72