Page 21 - Read Online
P. 21

Lugli et al. Microbiome Res Rep 2023;2:15  https://dx.doi.org/10.20517/mrr.2022.21  Page 9 of 16



 Table 1. MEGAnnotator2 report of 10 sequentially processed microbial genomes using short-read technology

 High                                  Number                           Number      Number
 Sequencing   Filtered  16S rRNA gene   Genome   Genome   Average   Genome  Number
 SRA  quality   ANI screening          of                               of rRNA     of tRNA
 output  reads  identity  completeness  contamination  coverage  length  of genes
 reads                                 contigs                          genes       genes
 SRR11910208 1000000  630615  630615  Streptococcus   Streptococcus   99.89%  11.19%  51  913  2,904,834 2,934  7  50
 salivarius subsp.   thermophilus
 thermophilus 100%  99.2%
 SRR14415532 1000000  998717  998716  Leuconostoc   Leuconostoc   100%  0.18%  154  13  2,110,850  2,089  5  55
 mesenteroides 100% suionicum 93.9%

 SRR15311866  1000000  999947  999947  Bifidobacterium breve  Bifidobacterium   100%  0.12%  126  35  2,374,842  2,011  3  55
 JCM 7019 99.7%  breve 98.3%
 SRR16352010 1000000  997594  996662  Bifidobacterium   Bifidobacterium   100%  0%  254  17  2,365,405  1,959  3  57
 longum 100%  longum 98.7%
 SRR18214268 1000000  726379  726379  Lactobacillus   Lacticaseibacillus   99.46%  0%  67  90  3,055,144  2,903  5  54
 paracasei 100%  paracasei 99.0%
 SRR22378037 1000000  892220  892220  Lactococcus lactis   Lactococcus   100%  0%  72  63  2,460,545 2,462  4  58
 99.9%  cremoris 88.0%

 SRR22543247 1000000  998973  998973  Enterococcus faecium  Enterococcus   99.63%  0.50%  104  147  3,1005,07  3,022  8  59
 100%  faecium 94.6%
 SRR22666477 1000000  986064  986062  Shigella sonnei   Shigella boydii   99.93%  0.33%  58  76  5,089,127  4,834  9  86
 99.9%  98.7%
 SRR8981643  1000000  997089  997089  Clostridium botulinum  Clostridium cagae   100%  0%  140  47  3,825,030  3,529  14  77
 100%  97.6%

 SRR9222459  1000000  841930  841930  Faecalibacterium   Faecalibacterium   100%  0.14%  72  87  3,356,538  3,213  9  63
 prausnitzii 99.9%  duncaniae 85.8%




 using 500,000 long reads coupled with one million short reads for the hybrid approach. The average execution of the complete pipeline using long reads was

 56.5 min, with the assembly step managed by CANU representing the most time-consuming (median of 2,761 sec) [Figure 2B]. Instead, by using a
 combination of different sequences, MEGAnnotator2 takes an average of 53.5 min, validating the assembly step of long reads to be the most complex

 procedure to date [Figure 2C]. Furthermore, using a hybrid approach, we highlighted the impact of long read filtering using the information of short reads that
 takes approximately five times more than the long read filtering alone, while the polishing of the assembled data takes additional 3.4 min [Supplementary
 Tables 2 and 3].



 Thus, based on the achieved results, MEGAnnotator2 can manage all its functions in approximately 14.5 min for short reads, 56.5 min for long reads, and 53.5

 min using hybrid reads. Even if the hybrid pipeline introduces two additional analyses represented by long-read filtering by short-read data and genome
 sequence polishing, the average computing time of the pipeline is the same, highlighting high variability in the capability of the assembler to manage long reads
   16   17   18   19   20   21   22   23   24   25   26