Page 121 - Read Online
P. 121

Pham et al. Microbiome Res Rep 2024;3:25  https://dx.doi.org/10.20517/mrr.2024.01  Page 9 of 16

                                                Centrifuge       0.058           0.536      0.104
                                                Sourmash         0.273           0.550      0.365
                              RM_S001           MetaBIDx         0.821           0.397      0.535
                                                CLARK            0.020           1.000      0.040
                                                KrakenUniq       0.019           0.573      0.036
                                                Kraken2          0.023           0.393      0.044
                                                Centrifuge       0.020           0.629      0.039
                                                Sourmash         0.400           0.607      0.482
                              RM_S002           MetaBIDx         0.941           0.276      0.427
                                                CLARK            0.020           1.000      0.040
                                                KrakenUniq       0.019           0.573      0.036
                                                Kraken2          0.023           0.393      0.044
                                                Centrifuge       0.020           0.629      0.039
                                                Sourmash         0.519           0.629      0.569
                              RL_S001           MetaBIDx         1.000           0.421      0.593
                                                CLARK            0.007           1.000      0.013
                                                KrakenUniq       0.006           0.654      0.013
                                                Kraken2          0.010           0.577      0.020
                                                Centrifuge       0.007           0.731      0.013
                                                Sourmash         0.388           0.731      0.569
               Bolded numbers in the table are the best scores in the comparison.
               Low complexity sample (RL_S001): MetaBIDx achieved perfect precision (1.000) and a recall of 0.421,
               leading to an F1-score of 0.593. This is substantially better than the other tools, which, despite having high
               recalls, had extremely low precision and F1-scores.


               In summary, MetaBIDx consistently outperformed the other tools across the CAMI dataset, particularly in
               terms of precision and F1-score. This suggests that MetaBIDx is more effective in accurately identifying the
               species present in a sample, with fewer false positives compared to other methods. Its performance is
               particularly notable in complex samples, where accurate species identification is more challenging.

               Enhancing precision with clustering of “approximate” coverages
               Here, we aim to present a more equitable comparison between our tool and the others. The inherent
               limitation of these read-classification methods lies in their tendency to generate a high number of false
               positives, leading to lower precision in species prediction, as observed in the previous section.


               We applied the false-positive-reduction strategy to all methods. This strategy groups species with similar
               coverages into the same clusters. The rationale is that true positives (actual species present in the sample)
               will typically show higher coverage compared to false positives (species incorrectly identified due to random
               read matches). By filtering out species with low coverage, we aim to reduce the number of false positives,
               thereby increasing the precision of species identification across all methods.

               In this experiment, we excluded Sourmash as this tool is not a read classification tool. The result is
               summarized in Table 2.

               Mende Dataset: Applying the clustering strategy significantly improved the precision of all methods,
               particularly in the 10-species sample, where MetaBIDx, CLARK, KrakenUniq, and Kraken2 all achieved
               perfect precision (1.000). MetaBIDx maintained its high performance across all samples, consistently
   116   117   118   119   120   121   122   123   124   125   126