Page 61 - Read Online
P. 61

Feusier et al. J Transl Genet Genom 2021;5:189-99  https://dx.doi.org/10.20517/jtgg.2021.05  Page 191

               The scarcity of CLL family resources, heterogeneity across families and the likely complexity of the disease
               mechanism (multiple genes, multiple alleles, incomplete penetrance, and sporadic cases) leads to challenges
               in uncovering inheritable genetic abnormalities. Our goal was to identify CLL risk loci using unique
               resources available in Utah through the Utah Population Database (UPDB) to identify large, extended,
               high-risk pedigrees and a powerful new method specifically designed for large pedigrees and to address
               heterogeneity.

               The UPDB includes a 16-generation genealogy of approximately 5 million people with at least one event in
               Utah that is record-linked to statewide cancer records since 1966 from the NCI Surveillance, Epidemiology,
               and End Results (SEER) Program Utah Cancer Registry (UCR) and state vital records . Within the UPDB,
                                                                                       [14]
               ancestors whose descendants have an increased incidence of malignances as compared to internal cancer
               rate controls and years at risk can be identified and studied as high-risk pedigrees.

               Shared genomic segment (SGS) analysis is a recombinant-based family analysis (“linkage-like”), developed
                                                                                [15]
               to identify regions that segregate to cases in an extended high-risk pedigree . When available to study, a
               single large pedigree can increase homogeneity, garner equivalent power to many small pedigrees, and be
               sufficient alone to declare genome-wide significance. However, full likelihood-based linkage approaches are
               intractable in very large pedigrees. Furthermore, traditional linkage methods are not robust to substantial
               intra-familial heterogeneity (sporadic cases), which must be accounted for in very large pedigrees. To
               combat this, SGS identifies long stretches of consecutive identity-by-state (IBS) alleles to infer shared
               inherited identity-by-descent (IBD) haplotypes. The algorithm iterates over (and corrects for) assessment of
               subsets of cases to account for possible sporadic cases. Overall, SGS is the ideal method for investigating
               disease risk loci shared by a common founder in large pedigrees.


               Here, we use the UPDB to identify a six-generation high-risk CLL pedigree, the largest CLL family studied
               to-date. We performed SGS to identify inherited risk loci likely to harbor disease genes for CLL.

               METHODS
               Identification and ascertainment of the high-risk pedigree
               The UPDB was used to identify ancestors whose descendants showed a statistical excess of CLL (P < 0.05).
               Expectation was based on internal disease rates based on birth cohort, sex, birth place (in/outside Utah) and
               years at risk. These were considered high-risk CLL pedigrees. Once identified, living CLL cases within high-
               risk pedigrees were made aware of the study by representatives of the UCR, and those interested were
               invited to participate. Cases and family members wishing to be part of the study were subsequently enrolled
               by the study team, including informed consent, questionnaires and biospecimens. Individuals in 23 high-
               risk CLL pedigrees were enrolled as previously described . Only one six-generation pedigree with 24 CLL
                                                               [16]
                                                                                               [17]
               cases contained sufficient meioses (m ≥ 15) between sampled CLL cases for SGS analysis . Figure 1A
               illustrates all 24 cases in the pedigree. Figure 1B shows the reduced structure containing only the eight
               sampled CLL cases analyzed in the SGS analysis.

               Acquisition of materials and genotyping
               Peripheral blood samples were processed to DNA. Individuals in the pedigrees were genotyped using the
               Illumina Human 610Q high-density single nucleotide polymorphism (SNP) array. Genotypes were called
               using standard Illumina protocols. Alleles were re-oriented to align with 1000 Genomes Project sequence
               data . SNP quality control was performed alongside other project data using PLINK and included: SNP
                   [18]
               call-rate (95%), sample call-rate (90%), removal of monomorphic SNPs, and failure of Hardy-Weinberg
               equilibrium (P < 1.0 × 10 )  . After quality control, 555,091 autosomal SNPs were available for SGS. The
                                    -5 [19-22]
   56   57   58   59   60   61   62   63   64   65   66