Page 62 - Read Online
P. 62

Page 14 of 21               Ponsero et al. Microbiome Res Rep 2023;2:27  https://dx.doi.org/10.20517/mrr.2023.26

               Table 1. Summary of tools, overview of the algorithm, reference to the publication and computational resources for the pairwise
               comparison of 30 metagenomes
                Name     Algorithm summary Output distance  Publication    Resources
                                                                      [8]
                MASH     Sketched       Presence/absence Jaccard  Ondov et al. 2016  CPU: 4
                         k-mer spectra                                     Mem: 4.21 MB
                                                                           Runtime: 01:04:36
                kWIP     Sketched       Other              Murray et al. 2017 [22]  CPU: 4
                         k-mer spectra                                     Mem: 150.58 GB
                                                                           Runtime: 03:24:55
                                                                     [21]
                SourMash  Sketched      Other              Pierce et al. 2019  CPU: 4
                         k-mer spectra                                     Mem: 88.63 MB
                                                                           Runtime: 02:26:23
                HULK     Sketched       Presence/absence Jaccard   Rowe et al. 2019 [20]  CPU: 4
                         k-mer spectra  Bray-Curtis                        Mem: 2.2 GB
                                        & Others                           Runtime: 04:10:51
                                                                      [17]
                Simka-min  Sketched     Presence/absence Jaccard   Benoit et al. 2020  CPU: 4
                         k-mer spectra  Bray-Curtis                        Mem: 24 GB
                                                                           Runtime: 00:20:08
                                                                     [10]
                Simka    Complete       Presence/absence Jaccard   Benoit et al. 2016  CPU: 4
                         k-mer spectra  Bray-Curtis                        Mem: 4.10 GB
                                        & Others                           Runtime*: 02:37:27
                Metafast  k-mer based,    Bray-Curtis      Ulyantsev et al. 2016 [24]  CPU: 4
                         de Bruijn graphs                                  Mem: 66.98 GB
                                                                           Runtime: 03:26:11
                                                                    [9]
                LIBRA    Complete       Bray-Curtis        Choi et al. 2019  Not computed (requires Hadoop cluster)
                         k-mer spectra   & Others
                                                                       [23]
                Triagetool  Read-based    Other            Fimerelli et al. 2013  Not computed (not updated for recent systems)
                         k-mer comparison
                                                                      [3]
                Compareads Read-based    Other             Maillet et al. 2012  Not computed (tool deprecated)
                         k-mer comparison
                Commet   Read-based     Other              Maillet et al. 2014 [4]  CPU: 4
                         k-mer comparison                                  Mem: 1.07 GB
                                                                           Runtime: 17:07:14
                                                                   [26]
                Cafe**   Sketched k-mer spectra Other      Lu et al. 2017  CPU: 4
                                                                           Mem: 6.39 GB
                                                                           Runtime: 00:03:53
               *For tools with cluster commands, the subjobs also received four cores and 24 GB of memory. Runtime was calculated by summing the runtime of
               the main job and all subjobs. Each counting subjob for Simka averaged 4 GB of memory utilized; **Cafe was run using a k-mer size of 5bp.


               tool requires a Hadoop cluster to be run , Comparead  was excluded since the tool was deprecated when
                                                              [3]
                                                  [9]
               Commet was published , and TriageTool  could not be installed and run as the tool was not updated for
                                   [4]
                                                   [23]
               current systems. As expected, Commet, which used a read-based comparison, required a significantly longer
               computational time than the other k-mer-based tools. Tools relying on a sketching approach to accelerate
               their computation finished the pairwise comparison quickly, between 20 min to 4 h. Strikingly, Simka,
               which computes complete k-mer spectra finished the comparison in a comparable time with the same
                                                                                                        [22]
               resources. Finally, Metafast  required a larger memory allocation to perform this comparison and kWIP
                                      [24]
               memory requirements were dependent on the size of the k-mer countgraph generated by khmer . A max
                                                                                                 [25]
               sketch size of 1e09 utilized 17 GB of memory, while a max sketch size of 1e10 utilized 150.58 GB of memory.
               Cafe was originally designed to perform k-mer-based comparisons using a small k-mer size, and the authors
               previously demonstrated the tool’s performance on metagenomic datasets using a k-mer size of 5bp . In
                                                                                                     [26]
               order to fairly assess the computational requirements of all tools on a similar task, we attempted to run the
               tool using a k-mer size of 31bp. However, these parameters required a large memory size that could not be
               accommodated. Presented below are therefore the computational requirements of Cafe using a k-mer size of
               5pb.
   57   58   59   60   61   62   63   64   65   66   67