Page 119 - Read Online

P. 119

Pham et al. Microbiome Res Rep 2024;3:25 https://dx.doi.org/10.20517/mrr.2024.01 Page 7 of 16

Such methods rely on the accuracy of reads identification to determine the presence or absence of species in
a sample.

Mende Dataset. Table 1 reports the performance of our method versus the others on the Mende dataset.
The comparison clearly demonstrates that MetaBIDx significantly outperforms other tools in predicting
species across different sample complexities.

In the 10-species sample, MetaBIDx achieved perfect precision (1.000) and a high recall (0.800), resulting in
an F1-score of 0.889. Sourmash also achieved a competitive precision (0.900) and a high recall (0.900), as
well as the highest F1-score of 0.900. This is markedly superior to the other classification tools, which,
despite high recall values (0.800 for KrakenUniq, Kraken2, and Centrifuge, and 1.000 for CLARK), had very
low precision, leading to considerably lower F1-scores (ranging from 0.041 to 0.075 for the two Kraken and
Centrifuge tools, and 0.053 for CLARK).

In the 100-species sample, MetaBIDx again maintained perfect precision (1.000) and an increased recall
(0.976), resulting in an F1-score of 0.988. Sourmash still had competitive precision (0.852), recall (0.750),
and an F1-score of 0.800. The other tools showed improvement in precision compared to the 10-species
sample but were still significantly lower than MetaBIDx. Their recall values ranged from 0.713 to 1.000, and
F1-scores varied between 0.345 and 0.382, still considerably lower than MetaBIDx.

For the most complex sample with 400 species, MetaBIDx maintained its high performance with perfect
precision (1.000) and a recall of 0.970, leading to an F1-score of 0.985. Other tools showed a notable
improvement in this category, with CLARK reaching an F1-score of 0.933. However, MetaBIDx still
outperformed them, as the F1-scores for KrakenUniq, Kraken2, Centrifuge, and Sourmash were 0.773,
0.807, 0.879, and 0.843, respectively.

Overall, MetaBIDx consistently exhibits superior performance in species prediction across various sample
complexities, particularly in maintaining high precision without sacrificing recall, leading to significantly
higher F1-scores compared to other tools.

CAMI dataset. Table 1 reports the performance of our method versus the others on the Mende dataset. The
results highlight the comparative efficacy of these tools across a range of samples with varying complexities.

High complexity samples (RH_S001 to RH_S005): In these samples, MetaBIDx consistently demonstrated
superior precision, ranging from 0.839 to 0.885, and recall values varied from 0.449 to 0.778. This resulted in
F1-scores between 0.591 and 0.807, significantly higher than the other tools. In contrast, tools like CLARK,
KrakenUniq, Kraken2, and Centrifuge exhibited much lower precision (consistently below 0.1) and
F1-scores, despite having high recall values. This indicates that while these tools were able to identify a
broad range of species (high recall), they also misidentified many species (low precision), reducing their
overall accuracy. Sourmash’s F1-scores varied from 0.365 to 0.385, slightly higher than other tools.

Medium complexity samples (RM_S001 and RM_S002): MetaBIDx again outperformed the other tools with
higher precision and F1-scores. Particularly in RM_S001, MetaBIDx achieved a high precision of 0.821,
albeit with a lower recall of 0.397, resulting in an F1-score of 0.535. In RM_S002, Sourmash achieved the
highest F1-score (0.569), followed by MetaBIDx (0.427). The other tools, while maintaining perfect or near-
perfect recall, had very low precision and F1-scores, indicating a high rate of false positives in species
identification.

114 115 116 117 118 119 120 121 122 123 124