Page 63 - Read Online
P. 63
Feusier et al. J Transl Genet Genom 2021;5:189-99 https://dx.doi.org/10.20517/jtgg.2021.05 Page 193
segments at the same position that are the same or longer than that observed. At one million simulations, a
distribution is fit, based on the set of genome-wide empirical P-values (under the assumption that the
majority of segments across the genome represent the null). This distribution is used to establish the
pedigree-specific genome-wide threshold, corresponding to a false-positive rate of μ = 0.05 per genome,
based on the Theory of Large Deviations . Simulations then continue, as necessary, until all P-values are
[25]
estimated to resolution.
Establishing germline sharing
The DNA studied was derived from whole blood lymphocytes and therefore may be contaminated with
malignant CLL cells. To delineate possible contamination, we obtained second blood draws for two of the
CLL cases and used flow cytometry to cell-sort CD19+/CD5+ cells (malignant CLL cells) and non-
malignant cells (reflective of germline). Genotypes from these sorted cells were used to confirm that alleles
shared in SGS regions were germline in origin.
Haplotype estimation
At a locus, SGS analysis identifies the region with the best statistical evidence (lowest P-value) and defines
the subset of cases that share it (the sharing group). By definition, all cases in the sharing group can share a
haplotype across the best region. Subsets of the sharing group may, however, share longer regions (with less
significant P-values). We followed the pattern of P-values as they iteratively diminished to identify the
longer segments shared by fewer cases in the sharing group. Cases who are removed from subsequent
longer regions indicate loss of the ancestral haplotype, i.e., a recombinant event. In this way, the haplotypes
for each individual of the sharing group can be estimated surrounding the SGS region.
Human Protein Atlas transcriptome analysis
We used three publicly available datasets from the Human Protein Atlas (HPA) version 20.0
(https://v20.proteinatlas.org/) to examine the expression for genes in an SGS region in the most relevant
tissues, cell-lines and cell types from peripheral blood mononuclear cells [26-29] . Expression data for 37 tissues,
69 cell lines (no CLL) and 18 blood cell types were available. Normalized expression values for five lymph
tissues (B-cells, bone marrow, lymph node, spleen and T-cells), seven cell-lines [Daudi (human Burkitt
lymphoma), Karpas-707 (multiple myeloma), REH (pre-B cell leukemia), RPMI-8226 (multiple myeloma),
U-266/70 (multiple myeloma, IL-6-dependent), U-266/84 (multiple myeloma), U-698 (lymphoblastic
lymphosarcoma)], and two blood cell types (memory B-cells, naïve B-cells) were selected as most relevant.
RESULTS
All eight sampled CLL individuals in the pedigree passed genotyping quality control. The final pedigree for
analysis included the eight CLL cases separated by 28 meioses. A genome-wide significance threshold of α =
3.94 × 10 was established for the pedigree.
-7
One genome-wide significant SGS region was identified at chromosome 2q22.1 (P = 1.9 × 10 , LOD-
-7
equivalent 5.6) [Figure 2A and B]. This 2q22.1 locus is inherited through 26 meioses to seven of the eight
studied CLL cases [Figure 1B]. Two additional obligate carriers (parents) with hematological malignancies
also shared the segregating region: non-Hodgkin lymphoma, NOS and leukemia, NOS [Figure 1B]. The
region shared by all seven CLL cases contains 204 consecutive SNPs and is 0.9 Mb in length, from 136.1-
137.0 Mb (GRCh38). Alleles in the sorted cells confirmed the shared region was germline. Figure 3A
illustrates the SGS region and each of the seven estimated haplotypes in the case-sharers at this locus. The
shared region encompasses the entire CXCR4 gene, part of the gene, THSD7B (thrombospondin type 1
domain containing 7B), and two unstudied non-coding genes (AC112255.1 and RN7SKP141). The mRNA
expression of CXCR4 and THSD7B in 14 relevant tissues, cells and cell-lines from the HPA is shown below