Page 63 - Read Online
P. 63

Feusier et al. J Transl Genet Genom 2021;5:189-99  https://dx.doi.org/10.20517/jtgg.2021.05  Page 193

               segments at the same position that are the same or longer than that observed. At one million simulations, a
               distribution is fit, based on the set of genome-wide empirical P-values (under the assumption that the
               majority of segments across the genome represent the null). This distribution is used to establish the
               pedigree-specific genome-wide threshold, corresponding to a false-positive rate of μ = 0.05 per genome,
               based on the Theory of Large Deviations . Simulations then continue, as necessary, until all P-values are
                                                  [25]
               estimated to resolution.

               Establishing germline sharing
               The DNA studied was derived from whole blood lymphocytes and therefore may be contaminated with
               malignant CLL cells. To delineate possible contamination, we obtained second blood draws for two of the
               CLL cases and used flow cytometry to cell-sort CD19+/CD5+ cells (malignant CLL cells) and non-
               malignant cells (reflective of germline). Genotypes from these sorted cells were used to confirm that alleles
               shared in SGS regions were germline in origin.


               Haplotype estimation
               At a locus, SGS analysis identifies the region with the best statistical evidence (lowest P-value) and defines
               the subset of cases that share it (the sharing group). By definition, all cases in the sharing group can share a
               haplotype across the best region. Subsets of the sharing group may, however, share longer regions (with less
               significant P-values). We followed the pattern of P-values as they iteratively diminished to identify the
               longer segments shared by fewer cases in the sharing group. Cases who are removed from subsequent
               longer regions indicate loss of the ancestral haplotype, i.e., a recombinant event. In this way, the haplotypes
               for each individual of the sharing group can be estimated surrounding the SGS region.


               Human Protein Atlas transcriptome analysis
               We  used  three  publicly  available  datasets  from  the  Human  Protein  Atlas  (HPA)  version  20.0
               (https://v20.proteinatlas.org/) to examine the expression for genes in an SGS region in the most relevant
               tissues, cell-lines and cell types from peripheral blood mononuclear cells [26-29] . Expression data for 37 tissues,
               69 cell lines (no CLL) and 18 blood cell types were available. Normalized expression values for five lymph
               tissues (B-cells, bone marrow, lymph node, spleen and T-cells), seven cell-lines [Daudi (human Burkitt
               lymphoma), Karpas-707 (multiple myeloma), REH (pre-B cell leukemia), RPMI-8226 (multiple myeloma),
               U-266/70 (multiple myeloma, IL-6-dependent), U-266/84 (multiple myeloma), U-698 (lymphoblastic
               lymphosarcoma)], and two blood cell types (memory B-cells, naïve B-cells) were selected as most relevant.

               RESULTS
               All eight sampled CLL individuals in the pedigree passed genotyping quality control. The final pedigree for
               analysis included the eight CLL cases separated by 28 meioses. A genome-wide significance threshold of α =
               3.94 × 10  was established for the pedigree.
                       -7
               One genome-wide significant SGS region was identified at chromosome 2q22.1 (P = 1.9 × 10 , LOD-
                                                                                                   -7
               equivalent 5.6) [Figure 2A and B]. This 2q22.1 locus is inherited through 26 meioses to seven of the eight
               studied CLL cases [Figure 1B]. Two additional obligate carriers (parents) with hematological malignancies
               also shared the segregating region: non-Hodgkin lymphoma, NOS and leukemia, NOS [Figure 1B]. The
               region shared by all seven CLL cases contains 204 consecutive SNPs and is 0.9 Mb in length, from 136.1-
               137.0 Mb (GRCh38). Alleles in the sorted cells confirmed the shared region was germline. Figure 3A
               illustrates the SGS region and each of the seven estimated haplotypes in the case-sharers at this locus. The
               shared region encompasses the entire CXCR4 gene, part of the gene, THSD7B (thrombospondin type 1
               domain containing 7B), and two unstudied non-coding genes (AC112255.1 and RN7SKP141). The mRNA
               expression of CXCR4 and THSD7B in 14 relevant tissues, cells and cell-lines from the HPA is shown below
   58   59   60   61   62   63   64   65   66   67   68