Page 78 - Read Online
P. 78
Renzi et al. Microbiome Res Rep 2024;3:2 https://dx.doi.org/10.20517/mrr.2023.27 Page 9 of 16
[60]
sequence with a single SNP with respect to another) from sequencing errors . Since these processes rely on
the single nucleotide variation of amplicons for defining taxonomy, they usually lead to an increased
estimation of alpha diversity, mainly due to their higher sensitivity with respect to identity-based
approaches. One of the greatest assumptions of these methods is that the amplicon sequence should not
vary in length, and ITS sequences from fungi do not share this assumption. This may lead to biases in the
[122]
discriminatory potential of these methods, even if, at present, no extensive survey has been performed .
To reduce these biases, a number of ITS sequencing-based systems have been created to identify different
fungal species. Some of them are able to examine both 16S rRNA (from bacteria) and ITS (from fungi), such
as Kraken , Mothur , Qiime [119,124] , Vsearch , and DADA2 ; others are specialized on fungi only, such
[115]
[123]
[117]
[60]
[129]
[126]
[116]
as Plutof , Clotu , PIPITS , CloVR-ITS , MICCA , and BioMaS . Despite these well-known
[127]
[128]
[125]
issues, standardized pipelines are still to come, leaving the choice of the analysis method in the hands of
researchers. This situation opens a whole new scenario where researchers are responsible for the pipeline
they used (which, in most cases, is published and freely available), and this choice may alter the research
outcomes , paving the way for contrasting conclusions. Although pipelines based on the bacterial 16S
[130]
gene (or part of it) have been extensively used in the last three decades, the “yeast world” remains largely
unexplored, and the effect of one pipeline compared to another is unpredictable. A summary of the main
pipelines available is reported in Table 3 [60,115-117,119,123,124,126-129,131,132] .
In the context of metagenomic WGS, two primary strategies are commonly employed to analyze raw data:
the alignment-based approach and the assembly-based approach. The first one involves mapping individual
sequencing reads to a reference database or a reference genome. On the other hand, the second approach
assembles reads de novo to form contigs, which are then clustered into so-called genome bins during a
[84]
binning process. Combining both approaches is frequently advocated for result accuracy . By now, many
bioinformatic tools are available. Alignment-based tools are strong in taxonomic profiling and identifying
known microorganisms. They include a step of fragment recruitment in order to map all the reads to one or
more selected references. Among taxonomic profilers, MetaPlhAn2 , Kraken2 , and DIAMOND
[133]
[135]
[134]
stand out for different skills. If you need high specificity and rapid analysis, MetaPhlAn2 might be a good
choice. For comprehensive database coverage and strain-level resolution, Kraken 2 is valuable. DIAMOND
allows customization and offers fast alignment capabilities, but it requires additional steps for taxonomic
profiling. Assembly-based tools, instead, are essential for discovering novel organisms and in-depth
functional analysis within metagenomic communities. Their workflow includes an assembler that is well
[136]
suited for the reconstruction of long contigs and a genome binner to cluster such sequences from the same
organism . When selecting an assembler for WGS data, the type of sequencing technology used, the
[137]
genome size, the desired level of assembly completeness, and the availability of computational resources
should be taken into consideration. MetaSPAdes , MegaHit , and IDBA-UD are the most popular
[140]
[139]
[138]
metagenome assemblers, also for fungal genomes. As well as for assemblers, there is no binning tool
designed exclusively for fungal sequences, so general metagenomic binners are being used, like
METABAT2 , CONCOCT , MaxBin 2.0 and MetaWrap to name a few of the most efficient. Many
[141]
[144]
[142]
[143]
researchers also employ hybrid assembly strategies that combine short-read and long-read data to achieve
more accurate and complete genome assemblies . To delve deeper into the metagenomic data beyond
[95]
taxonomic composition, functional annotation becomes necessary. Fragment recruitment, as previously
described, involves leveraging a database of functionally annotated genes or proteins. This approach
provides a straightforward means to achieve functional annotation. Subsequently, annotations showing a
specific level of coverage can be linked to various aspects, such as metabolic pathways, with tools like
[145]
KEGG . Metagenomic WGS of fungi offers valuable insights into complex fungal communities, but it also
comes with several drawbacks and challenges. Bioinformatic complexity, functional annotation, short-read
sequencing, not standardized pipelines, data volume and processing are probably the main ones. Addressing
these drawbacks often requires a combination of improved sequencing technologies, more comprehensive