Page 97 - Read Online
P. 97
Rhoades et al. J Transl Genet Genom 2019;3:1. I https://doi.org/10.20517/jtgg.2018.26 Page 5 of 20
marker testing evaluates the effects of multiple rare variants to determine whether they are associated with
the disease or trait under investigation. The outcome of either test will depend greatly on whether the disease
is the result of a single common variant or of multiple rare variants. Here we review the approaches that
have been applied to the investigation of rare variants in genetic disorders in general, and how these tools
have been employed for analyzing NGS data [Table 1].
Approaches for case-control studies
RVAS approaches use statistical methods to combine the effects of rare variants to strengthen signals. The
burden test is carried out by collapsing variants in a gene or functional region into a single score, then the
association between the collapsed score and the phenotype is computed. The collapsing of the variants is
accomplished either through the selection of a threshold or by the use of permutation tests, which require a
[41]
lot of computational power . Collapsing of variants into a single score results in each variant being treated
as though it has the same effect on the phenotype. One way to test for the varying effects of rare variants on
phenotype is to use the Multi-phenotype Analysis of Rare Variants (MARV) test. This is a type of burden
analysis that utilizes multi-phenotype analysis. MARV calculates MAF at rare variants within a region of the
genome. It then performs a linear regression to associate phenotype or combinations of phenotypes with the
MAF for each variant [44,45] . The cohort allelic sums test is a burden test that collapses variants into a single
score and can identify genes that carry one or more risk alleles. The score indicates the presence or absence
[46]
of a minor allele, which is then tested for its association with a phenotype using univariate analysis .
Meanwhile, the combined multivariate and collapsing method combines the collapsing of rare variants with
[47]
multivariate analysis of both collapsed rare variants and uncollapsed common variants .
Note that the burden test assumes that all rare variants contribute the disease in the same direction.
However, most rare variants have small effects on disease. In addition, some variants are disease-causing
mutations, and some are protective variants. To address this scenario, the variance-component tests were
developed. The C-alpha test can be used to determine the directionality of an effect of multiple variants on a
phenotype. The generalized C-alpha test generates scores, based on summary statistics, which evaluates the
increasing and or decreasing effects of multiple variants on a phenotype based on the Gaussian distribution.
The phenotype can be binary or quantitative. The C-alpha test can be used on large population datasets
or on smaller samples, such as familial studies. The C-alpha test operates on three basic assumptions: the
number of variants, strength and independence of the effects, and the assumption that the variants are
[48]
normally distributed .
The Sequence Kernel Association Test (SKAT) is a type of supervised machine learning, that evaluates each
[49]
variant based on a P value and then weights each variant based on a linear or logistical regression model .
[50]
One advantage of SKAT is that it allows for the detection of interactions between variants . This method
also allows for the epistatic effects to be revealed. However, in the case where many rare variations are in a
particular region and have a similar effect on the phenotypes, the optimal unified test, called SKAT-Optimal
Unified Test (SKAT-O), can be used. Rather than using collapsed scores, SKAT-O utilizes the minimum P
[50]
values from different kernels, which include correlation effects . The SKAT-O program, due to its power.
When sample sizes are insufficient to generate accurate P values, the adaptive procedure-SKAT (AP-SKAT)
can be used. This software is similar to SKAT, except that it “adaptively stops” the permutation test when
the P-value is outside of the confidence interval of the P value that would be predicted using a binomial
[51]
distribution. This procedure can be used to reduce the risk of obtaining a type I error . AP-SKAT is more
[47]
efficient in terms of computation than either SKAT or SKAT-O . To investigate the dataset with fewer
[52]
samples than required by SKAT, exact variance component tests can be used . These tests minimize type 1
errors and in small samples, these types of tests identify more genes associated with polygenic traits that are
identified by SKAT. Gene association with multiple traits is a variation of the sequence kernel association,
called kernel distance covariance. This test uses non-parametric tests to test the association between rare