Page 74 - Read Online
P. 74

Moore et al. J Transl Genet Genom 2021;5:200-217  https://dx.doi.org/10.20517/jtgg.2021.08  Page 5

               the European Prospective Investigation into Cancer, Chronic Diseases, Nutrition, and Lifestyles (EPIC), the
               Mayo Clinic Case-Control Study of Diffuse Large B-cell Lymphoma (Mayo), the Genetic Epidemiology of
               CLL Consortium (GEC), and the Utah Chronic Lymphocytic Leukemia Study (Utah). Genotyping was
               performed on commercially available Illumina and Affymetrix platforms [Supplementary Table 1]. Details,
               including information on quality control and data cleaning, have been previously reported [6,8,10,11] . All studies
               obtained informed consent from participants and were approved by their appropriate Institutional Review
               Boards.


               Prior to analysis, additional quality control and filtering were applied to each GWAS separately, including
               removal of SNPs with a minor allele frequency < 0.05, > 3% missing, or Hardy-Weinberg P-value < 1 × 10
                                                                                                         -6
               among controls, and removal of subjects with call rates < 97%. After quality control metrics, genotype data
               were available for 10,467 NHL cases, including 3061 CLL, 3814 DLBCL, 2784 FL, and 808 MZL cases, as well
               as 9374 controls [Supplementary Table 2].


               We used PLINK1.9 [26,27]  to identify ROH; specifically, we used the two-step command --homozyg. In the first
               step, PLINK1.9 identifies directly genotyped SNPs that are possibly within an ROH by looking at 50-SNP
               sliding windows across the genome and flagging all SNPs that are encompassed by at least 5% of fully
               homozygous windows. For this step, we allowed one heterozygous SNP and up to five SNPs with no calls
               within each window to account for a small amount of possible genotyping error and loss. In the second step,
               ROH are identified from these sliding windows by requiring a minimum number of consecutive
               homozygous SNPs. We required at least 100 consecutive homozygous SNPs for each ROH and that these
               SNPs span at least 1500 kilobases (kb), with at least one SNP every 50 kb and the maximum gap between
               SNPs of 5000 kb. These parameters were selected with reference to the “ROH_1.5Mb” ROH calling
                                         [28]
               parameters used by Gazal et al.  We restricted analyses to the autosomal chromosomes.

               To estimate the extent of homozygosity across the genome, we calculated the fraction of the autosome
                                                                                          9
               covered by ROH (FROH) by summing the lengths of ROH and dividing by 3 × 10  base pairs as the
               approximate size of the autosome for all GWAS. As another measure to assess homozygosity, we also
               quantified and tested differences in relatedness across the genome in our study using a variant of the
               inbreeding coefficient, F3 . F3, which estimates the correlation between uniting gametes, is an alternative
                                     [29]
               to ROH-based estimates with potentially reduced bias and standard errors . We estimated F3 using the
                                                                                [30]
               -ibc command in PLINK1.9. To estimate the association of FROH and F3 with NHL, we then estimated beta
               coefficients and standard errors for each GWAS using logistic regression, adjusting for age, sex (except in
               the UCSF1/NHS study, where all controls were female), fraction of missing SNPs, and the ten principal
               components of ancestry to account for population stratification. The fraction of missing SNPs was
               calculated for each participant as the number of SNPs without calls divided by the total number of SNPs
               genotyped on the array that passed quality control metrics. Associations were combined across GWAS for
               each subtype of NHL using random-effects meta-analysis implemented with the command “metan” in
               STATA v15.

               After determining ROH as described above, we also tested whether specific genomic regions encompassed
               by ROH were associated with risk of each of the four NHL subtypes. We divided each autosomal
               chromosome into “bins” of 500 kb in length. We then calculated the midpoint of each identified ROH and
               assigned it to the corresponding bin. Each study participant in the analysis was therefore categorized as
               either homozygous (exposed) or heterozygous (unexposed) at each bin across the autosome. We calculated
               beta coefficients and standard errors for the association between presence of an ROH in each bin and risk of
               NHL subtype within each GWAS using logistic regression, adjusting for age, sex (except in the UCSF1/NHS
   69   70   71   72   73   74   75   76   77   78   79