Page 52 - Read Online
P. 52

Page 4 of 21                Ponsero et al. Microbiome Res Rep 2023;2:27  https://dx.doi.org/10.20517/mrr.2023.26

               SimSet 1: technical effects
               The simulated dataset 1 (SimSet 1) addressed technical variations between metagenomic datasets and, more
               specifically, differences in sequencing technology and sequencing depth. The dataset is composed of 100
               simulated metagenomes, each containing 25 bacteria species picked randomly from a list of 40 possible
               organisms that were randomly selected from complete genomes available in the RefSeq database . The
                                                                                                    [13]
               relative abundance of each of the 25 organisms in each simulated metagenome was obtained from a log-
               normal distribution.


               From the generated relative abundance profiles, InSilicoSeq was used to simulate metagenomes of
               increasing sequencing depth (50K, 100K, 500K, 1M, 5M, 10M, and 50M paired reads), using MiSeq, HiSeq,
               and NovaSeq error profiles.


               SimSet 2: human/PhiX contamination effect
               The simulated dataset 2 (SimSet 2) aims to evaluate the impact of low and high human DNA and low Phi
               X174 phage contamination in metagenomes. The low human contamination experiment leverages the
               relative abundance profiles used for the SimSet 1, with the random addition of 0 to 2% of human reads. On
               the other hand, the high contamination experiment uses the SimSet 1 relative abundance profile but with
               the random addition of 10% to 25% human reads. The PhiX contamination experiment uses the same
               relative abundance profiles, with a random addition of 0 to 2% Phi X174 reads.

               From these contaminated relative abundance profiles, InSilicoSeq was used to simulate metagenomes of
               increasing sequencing depth (50K, 100K, 500K, 1M, 5M, 10M, and 50M paired reads), using HiSeq error
               profiles.

               SimSet 3: community richness effect
               The simulated dataset 3 (SimSet 3) aims to evaluate the impact of increasing species richness. The dataset is
               composed of 5 sets of 100 simulated metagenomes each, containing 5, 25, 50, 100, or 500 bacterial species
               picked randomly from a list of 10, 40, 80, 130 or 530 possible organisms, respectively. The relative
               abundance of each organism in each simulated metagenome was obtained from a log-normal distribution.

               From the generated relative abundance profiles, InSilicoSeq was used to simulate metagenomes of
               increasing sequencing depth (50K, 100K, 500K, 1M, 5M, 10M, and 50M paired reads), using HiSeq error
               profiles.

               SimSet 4: community taxonomic richness effect
               The simulated dataset 4 (SimSet 4) aims to evaluate the impact of increasing taxonomic diversity. The
               dataset is composed of 3 sets of 100 simulated metagenomes each, containing 50 bacterial species picked
               randomly from a list of 80 possible organisms belonging to the same taxonomic class (Actinomyces) or
               from the same taxonomic family (Mycobacteriaceae). An additional dataset was generated similarly but
               including all possible taxonomic classes. The relative abundance of each organism in each simulated
               metagenome was obtained from a log-normal distribution.

               From the generated relative abundance profiles, InSilicoSeq was used to simulate metagenomes of
               increasing sequencing depth (50K, 100K, 500K, 1M, 5M, 10M, and 50M paired reads), using HiSeq error
               profiles.
   47   48   49   50   51   52   53   54   55   56   57