Page 111 - Read Online
P. 111

Page 105                 Chu et al. J Transl Genet Genom 2023;7:196-212  https://dx.doi.org/10.20517/jtgg.2023.22

               Table 1. Summary of quality metrics for genome sequencing (GS)
                Metric              Threshold or expected value [45,48]  Mean of 240 GS data performed by HKGI
                Yield of data ≥ Q30 a  ≥ 80 Gb                      162 Gb
                         b
                Mean coverage       ≥ 30X                           41.0X
                        c
                Base ≥ Q30 %        ≥ 85%                           90.0%
                Clusters passing filter % d  ≥ 70%                  80.9%
                         e
                Sample identity     Match/not match                 All match
                Contamination % f   ≤ 2%                            0.0055%
                Mapping rate % g    > 95%                           99.9%
                            h
                10X percentage (%)  ≥ 95%                           95.7%
                Gene passed 15X % i  ≥ 90%                          99.3%
                           j
                Adapter-dimer %     < 0.2%                          0.0014%
                Duplication % k     < 15%                           14.4%
                          l
                Mean insert size    > 300 bp                        496bp
               a                       b                                                  c
                Total yield of data with ≥ Q30 scores;  Mean coverage across the human reference genome, after all filters are applied;  Percentage of bases that
                           d
               meet Q30 scores;  A cluster is considered to pass the filter when its chastity value is below 0.6 in the first 25 cycles. Cluster passing filter % is the
                                           e                                        f
               percentage of clusters with passing filter;  Concordance with genotype when family structure is available;  Estimated level of sample cross-
                                                         g
               individual contamination based on a genotype-free estimation;  Percentage of unique reads that mapped to the human reference genome;
               h                                            i
                Percentage of bases in human genome with sequencing depth of ≥ 10X;  A measure of completeness. Percentage of genes with sequencing depth
                     j                                                          k
               of ≥ 15X;  Fraction of pass-filtered reads that are unaligned and match to a known adapter sequence;  Percentage of mapped sequence that is
                            l
               marked as duplicate;  Median insert size for all paired-end reads where both ends map to the same chromosome.
               sequencing workload in the next few years. During the first 24 months since the launch of the HKGP,
               12,937 participants and their family members (6,680 genomes) have been recruited and sequenced by the
               end of July 2023. As expected, the majority of the cases are from the undiagnosed disease category and a
               smaller cohort (~12.1%) from hereditary cancer. In order to illustrate the performance of the GS workflow,
               ten sequencing runs carried out at the Laboratory consisting of 240 samples will be presented in the
               following sections.

               After a series of pilot studies on the method of genomic DNA (gDNA) extraction, two automated magnetic
               bead-based protocols were established for the extraction of DNA: a high-throughput system for whole
               blood samples, and a medium-throughput system for saliva, buccal swab, and tissue samples. The extracted
               DNA is eluted in a slightly alkaline buffer, 10 mM Tris-HCl, pH 8.0, and EDTA is omitted as it interferes
               with enzymatic reactions in the downstream sequencing library preparation. The extracted gDNA is
               assessed for degradation using agarose gel electrophoresis. Each DNA sample migrates as a high-molecular
               weight band without any smearing or signs of degradation, indicative of intact gDNA of high quality and
               integrity [Supplementary Figure 1]. The purity of extracted DNA is evaluated using the NanoDrop
               spectrophotometer. A typical pure gDNA has an A260/A280 absorbance ratio of 1.7-2.0 and an A260/A230
               ratio of 1.8-2.5. All extracted gDNA samples showed an absorbance ratio of A260/A280 and A260/A230
               within the acceptable range, denoting the absence of protein, carbohydrate, salts, and other contaminants.
               In addition to using UV absorbance, gDNA is quantified using the Qubit fluorometer. Figure 4B shows the
               concentration and total yield for the 240 gDNA samples. On average, 400 µL of whole blood yields ~6 µg of
               gDNA, at a concentration ~122 ng/µL. As indicated, all 240 blood samples resulted in high-quality gDNA,
               sufficient for GS coverage of 30X.

               The GS library preparation protocol has been optimised for both manual and automated operations.
               Figure 4C shows the quality metrics for the 240 GS libraries. Using 1 µg of gDNA as input, the average final
               library concentration is about 15.5 nM (in 25 µL volume), which is more than sufficient to reach 30-100X
               genome coverage. The insert size of the GS libraries is analysed on an automated electrophoresis system
   106   107   108   109   110   111   112   113   114   115   116