Page 114 - Read Online
P. 114

Chu et al. J Transl Genet Genom 2023;7:196-212  https://dx.doi.org/10.20517/jtgg.2023.22  Page 108


































                Figure 5. Comparison of exome sequencing (ES), short-read and long-read genome sequencing (GS) in resolving complex regions
                (“dark regions”) of the human genome. An example of such regions is the PKD1 (polycystic kidney disease 1) gene, where the first 32
                exons are located in a segmental duplicated region on chromosome 16p13, with six pseudogenes located 13 Mb proximal to the PKD1
                locus. In addition to high GC content, the sequences of these six pseudogenes are highly homologous to PKD1 and share 97% sequence
                similarity, making amplification- and capture-based approaches challenging. The PKD1 region is visualised with Integrative Genomics
                Viewer (IGV) using different sequencing approaches. Despite improvements in the capture probe design, ES of exons 1 to 14 of PKD1
                showed lower coverage, while GS achieved a more uniform coverage for the entire locus, including the duplicated region. Long-read GS
                enables unambiguous alignment of reads, complementing short-read GS, and enhances disease diagnosis. The orange double arrow
                indicates the “dark region”. The red dotted box and arrow indicate regions where short-read GS covers poorly.

               quantification, and only libraries that pass all quality indicators are sequenced. An equimolar library pool
               containing 24 dual-indexed GS libraries is combined prior to sequencing on the Illumina NovaSeq 6000
               sequencer using S4 flow cells with 300 cycles (2 × 150 bases).


               As the number of nanowells is fixed in the patterned flow cells, the optimal loading concentration is
               determined by comparing the nanowell occupied rate (%Occupied) and pass filter rate (%PF). The optimal
               cluster density was attained after several rounds of optimisation on the S4 flow cell. The overall
               performance of 10 sequencing runs is consistent and of high quality, as shown in Figure 4D. Over 81% of
               clusters passed the chastity filter; more than 90% of bases had Q30, while the error rate using spike-in PhiX
               control was less than 0.25%. On average, each GS library yielded 162 Gb of data with 41X depth and over
               516 million reads [Figure 4C and D]. Nearly all reads (99.9%) can be mapped to the human reference
               genome (GRCh38), while cross-individual contamination and adapter-dimer were merely detected. In
               summary, the overall statistics indicate that the established GS workflow is robust, and the data generated
               from the HKGI laboratory and CPOS are comparable, and on par with international standards.

               The performance of GS in detecting variants in complex regions (“dark regions”) of the human genome is
               illustrated in the example of the PKD1 (polycystic kidney disease 1) gene. Mutations in the PKD1 gene
               contribute to 80%-85% of autosomal dominant polycystic kidney disease (ADPKD) cases. ADPKD is an
               inherited renal disease characterised by many fluid-filled cysts in the kidneys that progressively impair
               kidney functions and eventually result in end-stage renal disease. PKD1 lies in a segmental duplication
   109   110   111   112   113   114   115   116   117   118   119