Page 114 - Read Online
P. 114
Chu et al. J Transl Genet Genom 2023;7:196-212 https://dx.doi.org/10.20517/jtgg.2023.22 Page 108
Figure 5. Comparison of exome sequencing (ES), short-read and long-read genome sequencing (GS) in resolving complex regions
(“dark regions”) of the human genome. An example of such regions is the PKD1 (polycystic kidney disease 1) gene, where the first 32
exons are located in a segmental duplicated region on chromosome 16p13, with six pseudogenes located 13 Mb proximal to the PKD1
locus. In addition to high GC content, the sequences of these six pseudogenes are highly homologous to PKD1 and share 97% sequence
similarity, making amplification- and capture-based approaches challenging. The PKD1 region is visualised with Integrative Genomics
Viewer (IGV) using different sequencing approaches. Despite improvements in the capture probe design, ES of exons 1 to 14 of PKD1
showed lower coverage, while GS achieved a more uniform coverage for the entire locus, including the duplicated region. Long-read GS
enables unambiguous alignment of reads, complementing short-read GS, and enhances disease diagnosis. The orange double arrow
indicates the “dark region”. The red dotted box and arrow indicate regions where short-read GS covers poorly.
quantification, and only libraries that pass all quality indicators are sequenced. An equimolar library pool
containing 24 dual-indexed GS libraries is combined prior to sequencing on the Illumina NovaSeq 6000
sequencer using S4 flow cells with 300 cycles (2 × 150 bases).
As the number of nanowells is fixed in the patterned flow cells, the optimal loading concentration is
determined by comparing the nanowell occupied rate (%Occupied) and pass filter rate (%PF). The optimal
cluster density was attained after several rounds of optimisation on the S4 flow cell. The overall
performance of 10 sequencing runs is consistent and of high quality, as shown in Figure 4D. Over 81% of
clusters passed the chastity filter; more than 90% of bases had Q30, while the error rate using spike-in PhiX
control was less than 0.25%. On average, each GS library yielded 162 Gb of data with 41X depth and over
516 million reads [Figure 4C and D]. Nearly all reads (99.9%) can be mapped to the human reference
genome (GRCh38), while cross-individual contamination and adapter-dimer were merely detected. In
summary, the overall statistics indicate that the established GS workflow is robust, and the data generated
from the HKGI laboratory and CPOS are comparable, and on par with international standards.
The performance of GS in detecting variants in complex regions (“dark regions”) of the human genome is
illustrated in the example of the PKD1 (polycystic kidney disease 1) gene. Mutations in the PKD1 gene
contribute to 80%-85% of autosomal dominant polycystic kidney disease (ADPKD) cases. ADPKD is an
inherited renal disease characterised by many fluid-filled cysts in the kidneys that progressively impair
kidney functions and eventually result in end-stage renal disease. PKD1 lies in a segmental duplication