Page 97 - Read Online
P. 97

Rhoades et al. J Transl Genet Genom 2019;3:1. I  https://doi.org/10.20517/jtgg.2018.26                                           Page 5 of 20

               marker testing evaluates the effects of multiple rare variants to determine whether they are associated with
               the disease or trait under investigation. The outcome of either test will depend greatly on whether the disease
               is the result of a single common variant or of multiple rare variants. Here we review the approaches that
               have been applied to the investigation of rare variants in genetic disorders in general, and how these tools
               have been employed for analyzing NGS data [Table 1].


               Approaches for case-control studies
               RVAS approaches use statistical methods to combine the effects of rare variants to strengthen signals. The
               burden test is carried out by collapsing variants in a gene or functional region into a single score, then the
               association between the collapsed score and the phenotype is computed. The collapsing of the variants is
               accomplished either through the selection of a threshold or by the use of permutation tests, which require a
                                      [41]
               lot of computational power . Collapsing of variants into a single score results in each variant being treated
               as though it has the same effect on the phenotype. One way to test for the varying effects of rare variants on
               phenotype is to use the Multi-phenotype Analysis of Rare Variants (MARV) test. This is a type of burden
               analysis that utilizes multi-phenotype analysis. MARV calculates MAF at rare variants within a region of the
               genome. It then performs a linear regression to associate phenotype or combinations of phenotypes with the
               MAF for each variant [44,45] . The cohort allelic sums test is a burden test that collapses variants into a single
               score and can identify genes that carry one or more risk alleles. The score indicates the presence or absence
                                                                                                       [46]
               of a minor allele, which is then tested for its association with a phenotype using univariate analysis .
               Meanwhile, the combined multivariate and collapsing method combines the collapsing of rare variants with
                                                                                        [47]
               multivariate analysis of both collapsed rare variants and uncollapsed common variants .

               Note that the burden test assumes that all rare variants contribute the disease in the same direction.
               However, most rare variants have small effects on disease. In addition, some variants are disease-causing
               mutations, and some are protective variants. To address this scenario, the variance-component tests were
               developed. The C-alpha test can be used to determine the directionality of an effect of multiple variants on a
               phenotype. The generalized C-alpha test generates scores, based on summary statistics, which evaluates the
               increasing and or decreasing effects of multiple variants on a phenotype based on the Gaussian distribution.
               The phenotype can be binary or quantitative. The C-alpha test can be used on large population datasets
               or on smaller samples, such as familial studies. The C-alpha test operates on three basic assumptions: the
               number of variants, strength and independence of the effects, and the assumption that the variants are
                                [48]
               normally distributed .
               The Sequence Kernel Association Test (SKAT) is a type of supervised machine learning, that evaluates each
                                                                                                       [49]
               variant based on a P value and then weights each variant based on a linear or logistical regression model .
                                                                                            [50]
               One advantage of SKAT is that it allows for the detection of interactions between variants . This method
               also allows for the epistatic effects to be revealed. However, in the case where many rare variations are in a
               particular region and have a similar effect on the phenotypes, the optimal unified test, called SKAT-Optimal
               Unified Test (SKAT-O), can be used. Rather than using collapsed scores, SKAT-O utilizes the minimum P
                                                                    [50]
               values from different kernels, which include correlation effects . The SKAT-O program, due to its power.
               When sample sizes are insufficient to generate accurate P values, the adaptive procedure-SKAT (AP-SKAT)
               can be used. This software is similar to SKAT, except that it “adaptively stops” the permutation test when
               the P-value is outside of the confidence interval of the P value that would be predicted using a binomial
                                                                                        [51]
               distribution. This procedure can be used to reduce the risk of obtaining a type I error . AP-SKAT is more
                                                                       [47]
               efficient in terms of computation than either SKAT or SKAT-O . To investigate the dataset with fewer
                                                                               [52]
               samples than required by SKAT, exact variance component tests can be used . These tests minimize type 1
               errors and in small samples, these types of tests identify more genes associated with polygenic traits that are
               identified by SKAT. Gene association with multiple traits is a variation of the sequence kernel association,
               called kernel distance covariance. This test uses non-parametric tests to test the association between rare
   92   93   94   95   96   97   98   99   100   101   102