Page 8 - Read Online
P. 8
Page 2 of 5 Shetty et al. Microbiome Res Rep 2023;2:14 https://dx.doi.org/10.20517/mrr.2022.18
especially mock communities with known microbial composition, is suggested to help identify technical
[9]
variability and improve protocols if required . Mock communities with known composition can be
included at the step of DNA extraction (mixture of different cells) or at the PCR step (mixture of DNA from
different cells). This allows for evaluating where technical variation is introduced. For example, it is known
that DNA extraction methods can differently bias certain cell types, e.g., Gram-positive and Gram-negative
[5,8]
bacteria, and that primer choice at the PCR step can neglect or favor some organisms . In addition, these
mock communities allow for identifying potential reagent contamination, well-to-well contamination, and
to some extent, cross-sample contamination [10-13] . Therefore, every microbiota profiling study should include
both positive and negative controls during sample processing.
Analyzing the mock community profiles and comparing them to the theoretical composition is, however,
not straightforward, especially for novice microbiome scientists. A very limited number of tools are
available for analyzing and comparing mock communities. The QIIME2 consists of a plugin called q2-
quality-control [14,15] . The ZymoBIOMICS research team provides a tool called FIGARO for
ZymoBIOMICS Microbial Community Standard . Here, we present an R-based tool, chkMocks,
[16]
TM
specifically designed for outputs from the R-based dada2 pipeline. The chkMocks R package provides a
slightly different approach for investigating mock communities (see below). This tool provides support for
TM
ZymoBIOMICS Microbial Community Standard and offers the ability to use it for custom mock
communities.
IMPLEMENTATION AND FEATURES
The chkMocks tool is implemented in R and depends on the following R packages/tools: dada2,
DECIPHER, tidyverse tools, microbiome, phyloseq and patchwork [17-22] . An overview of the workflow/steps is
depicted in Figure 1. The chkMocks tool requires data that completed the dada2 workflow, from raw reads
to obtaining the taxonomy assigned phyloseq object. The phyloseq object should have sequences of variants
as taxa names and not be converted to text ID’s like ASV:1, etc. The chkMocks tool can be used by two
different approaches, distinguished by the type of mock sample that is used. If users have sequenced the
ZymoBIOMICS™ Microbial Community Standard (Catalog No. D6300), they can use the default
checkZymoBiomics. For this, we have created a taxonomic training set using the FASTA files for full-length
16S rRNA gene sequences of expected microbes provided by ZymoBiomics. To demonstrate the chkMocks
utility, we used data from a study investigating reagent contamination using the ZymoBIOMICS™ Microbial
[10]
Community Standard . Here, the Microbial Community Standard was subjected to 8 series of a 3-fold
dilution (D0 to D8) and processed for 16S rRNA gene-based microbiota profiling. The outputs of
checkZymoBiomics are (a) A phyloseq object with input ASVs, their abundances and taxonomic
assignments; (b) A phyloseq object with input ASVs aggregated to species level and their abundances; and
(c) A correlation table with Spearman’s correlation (rho) values of positive controls compared to theoretical
composition. The user can simply plot the results with plotZymoDefault; this function visualizes the
composition of positive controls and theoretical composition as a stacked bar plot [Figure 2A]. This is
accompanied by a bar plot of Spearman’s correlation (rho) between positive controls and theoretical
composition [Figure 2B]. The user can also compare the abundances of individual taxa for a clearer
understanding of biases towards specific taxa [Figure 2C and D]. Here, the percentage of ‘unknown’ taxa,
i.e., not matching any of the expected taxa included in the mock community, increases as dilution increases
and is in agreement with values reported by the original study. All these plots provide first-hand insights to
the user about the quality of their sample processing by directly comparing positive controls with expected
observations.