Page 8 - Read Online
P. 8

Page 2 of 5                  Shetty et al. Microbiome Res Rep 2023;2:14  https://dx.doi.org/10.20517/mrr.2022.18

               especially mock communities with known microbial composition, is suggested to help identify technical
                                                        [9]
               variability and improve protocols if required . Mock communities with known composition can be
               included at the step of DNA extraction (mixture of different cells) or at the PCR step (mixture of DNA from
               different cells). This allows for evaluating where technical variation is introduced. For example, it is known
               that DNA extraction methods can differently bias certain cell types, e.g., Gram-positive and Gram-negative
                                                                                        [5,8]
               bacteria, and that primer choice at the PCR step can neglect or favor some organisms . In addition, these
               mock communities allow for identifying potential reagent contamination, well-to-well contamination, and
               to some extent, cross-sample contamination [10-13] . Therefore, every microbiota profiling study should include
               both positive and negative controls during sample processing.


               Analyzing the mock community profiles and comparing them to the theoretical composition is, however,
               not straightforward, especially for novice microbiome scientists. A very limited number of tools are
               available for analyzing and comparing mock communities. The QIIME2 consists of a plugin called q2-
               quality-control [14,15] . The  ZymoBIOMICS  research  team  provides  a  tool  called  FIGARO  for
               ZymoBIOMICS  Microbial Community Standard . Here, we present an R-based tool, chkMocks,
                                                             [16]
                             TM
               specifically designed for outputs from the R-based dada2 pipeline. The chkMocks R package provides a
               slightly different approach for investigating mock communities (see below). This tool provides support for
                             TM
               ZymoBIOMICS  Microbial Community Standard and offers the ability to use it for custom mock
               communities.

               IMPLEMENTATION AND FEATURES
               The chkMocks tool is implemented in R and depends on the following R packages/tools: dada2,
               DECIPHER, tidyverse tools, microbiome, phyloseq and patchwork [17-22] . An overview of the workflow/steps is
               depicted in Figure 1. The chkMocks tool requires data that completed the dada2 workflow, from raw reads
               to obtaining the taxonomy assigned phyloseq object. The phyloseq object should have sequences of variants
               as taxa names and not be converted to text ID’s like ASV:1, etc. The chkMocks tool can be used by two
               different approaches, distinguished by the type of mock sample that is used. If users have sequenced the
               ZymoBIOMICS™ Microbial Community Standard (Catalog No. D6300), they can use the default
               checkZymoBiomics. For this, we have created a taxonomic training set using the FASTA files for full-length
               16S rRNA gene sequences of expected microbes provided by ZymoBiomics. To demonstrate the chkMocks
               utility, we used data from a study investigating reagent contamination using the ZymoBIOMICS™ Microbial
                                  [10]
               Community Standard . Here, the Microbial Community Standard was subjected to 8 series of a 3-fold
               dilution (D0 to D8) and processed for 16S rRNA gene-based microbiota profiling. The outputs of
               checkZymoBiomics are (a) A phyloseq object with input ASVs, their abundances and taxonomic
               assignments; (b) A phyloseq object with input ASVs aggregated to species level and their abundances; and
               (c) A correlation table with Spearman’s correlation (rho) values of positive controls compared to theoretical
               composition. The user can simply plot the results with plotZymoDefault; this function visualizes the
               composition of positive controls and theoretical composition as a stacked bar plot [Figure 2A]. This is
               accompanied by a bar plot of Spearman’s correlation (rho) between positive controls and theoretical
               composition [Figure 2B]. The user can also compare the abundances of individual taxa for a clearer
               understanding of biases towards specific taxa [Figure 2C and D]. Here, the percentage of ‘unknown’ taxa,
               i.e., not matching any of the expected taxa included in the mock community, increases as dilution increases
               and is in agreement with values reported by the original study. All these plots provide first-hand insights to
               the user about the quality of their sample processing by directly comparing positive controls with expected
               observations.
   3   4   5   6   7   8   9   10   11   12   13