Page 43 - Read Online
P. 43

Page 115                                       Waller et al. J Transl Genet Genom 2021;5:112-23  I  http://dx.doi.org/10.20517/jtgg.2021.09

               at every position across the genome, the best evidence (lowest empirical P-value) for an excessive length
               of sharing is established (Figure 1, Step 4). This process results in a final optimized set of shared segments
               for a single pedigree. Each optimal segment corresponds to a specific subset of cases and has a nominal
               empirical P-value.

               For two pedigrees, the duo-SGS evidence is the combination of the nominal empirical P-values for the
               optimal segments at the same genome position in the two pedigrees. Specifically, the Fisher method to
               combine P-values was used. All possible pedigree pairs could be considered as separate analyses, but there
               are     pedigree pairs (ways to select 2 pedigrees from n total pedigrees), and hence multiple testing can
               rapidly become an issue. Alternatively, a single analysis comprising optimization across all pedigree pairs
               could be considered, but this global approach may cloud individual pedigree-pair findings. To balance these
               two extremes, we propose a fixed-pedigree duo-SGS strategy (Figure 1, Step 5). The procedure is as follows:
               (1) fix a pedigree of interest; (2) calculate genome-wide duo-SGS evidence for the fixed pedigree with each
               of the other pedigrees; and (3) optimize across the duo-SGS findings to identify the most significant duo-
               SGS result at each point across the genome. The optimized findings over pedigree pairs are the duo-SGS
               results for the fixed pedigree. In this approach, we identify the best two-pedigree results that include the
               fixed pedigree. The procedure is then repeated for each pedigree, thus producing duo-SGS results for each
               pedigree.


               Genome-wide thresholds for duo-SGS
               Critical to interpreting the observed duo-SGS results are genome-wide significance duo-SGS thresholds
               for each pedigree (Figure 1, Step 6). To establish these, we echo the same optimization process in null data.
                                                                                                       [20]
               Establishing these thresholds is similar to the calculation described for the single pedigree SGS method .
               Under the reasonable assumption that the vast majority of the genome represents chance sharing (i.e.,
               most of the genome does not contain a disease risk gene) we model the distribution for null sharing on the
               distribution of the empirical P-values for each pedigree. To avoid comparing the findings to themselves
               or skewing to possible true-positives, the empirical-P-values are perturbed, and the distribution-fitting
               is performed at 1 million simulations. The latter is to avoid inappropriate distribution-fitting to extreme
               outliers, the few results from the alternate hypothesis if included at their final resolution. To perturb an
               empirical P-value we determine its Wilson score 95% confidence interval (CI) (Equation 1) and randomly
               sample a value from within it.



                                                                                                                                                                     Equation 1



               where   is the empirical P-value, z is 1.96 (for the 95%CI), and n is the number of simulations (here,
               1,000,000). The Wilson interval was selected because it always produces non-negative confidence bounds
               for the P-values. The genome-wide set of perturbed empirical P-values for a pedigree are considered the
               “null” P-values for that single pedigree. The duo-SGS procedure (described above) is performed using
               the single pedigree genome-wide null P-values. The result of this process is a set of optimal duo-SGS null
               P-values.

               Genome-wide significant and suggestive thresholds are determined following our previously described
                                           [20]
               method for single pedigree SGS . Briefly, the null duo-SGS P-values are log-transformed and fitted to a
               gamma distribution. The shape (k) and rate (σ) parameters of the fitted distribution are applied using the
               Theory of Large Deviations to calculate the significance thresholds by solving:


               µ(X) = [C + 2GX]α(X)                                                                                                              Equation 2
   38   39   40   41   42   43   44   45   46   47   48