Page 125 - Read Online
P. 125

Pham et al. Microbiome Res Rep 2024;3:25  https://dx.doi.org/10.20517/mrr.2024.01  Page 13 of 16

               Table 3. Prediction performance with different k-mer quality thresholds
                Dataset       Sample             Quality       Precision       Recall      F1-score
                Mende         10 species         33            0.024           1.000       0.047
                                                 49            0.024           1.000       0.047
                              100 species        33            0.203           1.000       0.337
                                                 49            0.203           1.000       0.337
                              400 species        33            0.945           1.000       0.972
                                                 49            0.945           1.000       0.972
                CAMI          RH_S001            18            0.261           0.970       0.411
                                                 30            0.532           0.850       0.654
                              RH_S002            18            0.245           0.970       0.411
                                                 30            0.549           0.880       0.676
                              RH_S003            18            0.244           0.982       0.390
                                                 30            0.527           0.868       0.656
                              RH_S004            18            0.250           0.970       0.398
                                                 30            0.535           0.868       0.662
                              RH_S005            18            0.242           0.970       0.398
                                                 30            0.500           0.886       0.639
                              RM_S001            18            0.159           0.914       0.270
                                                 30            0.402           0.707       0.513
                              RM_S002            18            0.161           0.914       0.273
                                                 30            0.441           0.845       0.580
                              RL_S001            18            0.082           1.000       0.151
                                                 30            0.367           0.947       0.529


               The proposed approach has wide-ranging implications for diverse metagenomic applications, such as
               environmental monitoring, human microbiome research, and disease diagnostics. Its capability to
               accurately detect low-abundance species and differentiate closely related species is particularly valuable in
               these fields. The integration of MetaBIDx with other bioinformatics tools could lead to a more robust and
               comprehensive workflow in metagenomic analysis.

               Future research and development will focus on enhancing computational efficiency, expanding the
               microbial database for broader coverage, and refining the algorithm for increased accuracy. The study
               acknowledges certain limitations, including the computational demands of MetaBIDx, especially with
               exceptionally large datasets. Future versions aim to address these challenges, enhancing scalability and
               efficiency for more complex microbiome datasets. A crucial factor for the current version’s slower
               performance compared to established methods is the implementation focus on correctness and bug
               reduction rather than optimization. Future iterations of MetaBIDx will prioritize code optimization to
               improve running times while maintaining high performance in species prediction.


               Currently, MetaBIDx can identify and output unique k-mers for each reference genomes in the database.
               This functionality can be enhanced by enabling the tool to output unique regions. This upgrade will
               significantly improve the utility of our tool, particularly providing more precise genomic signatures for each
               genome, allowing for better discrimination between closely related organisms or strains.

               While we currently focus on species-level prediction, there is potential for explorations at higher taxonomic
               levels, such as family, class, or genus. Although the current version does not exploit taxonomic tree
               structures, MetaBIDx can identify uncatalogued species not yet included in existing databases. This
   120   121   122   123   124   125   126   127   128   129   130