Page 99 - Read Online
P. 99

Rhoades et al. J Transl Genet Genom 2019;3:1. I  https://doi.org/10.20517/jtgg.2018.26                                           Page 7 of 20
                                             [53]
               variants and multiple phenotypes . Similarity and dissimilarity are assessed for both genotype and
               phenotype and a matrix is formed for each variable. Then, the similarity or dissimilarity matrices for each
               variable are tested for independence. The calculation of P-values does not require any permutations and the
                                                 [53]
               method can be utilized on WES or WGS .
               Tools for family studies
               Family-based study designs are extremely advantageous to the study of rare variants because the frequency
               of rare alleles for a particular illness or disorder will be higher in a pedigree than among unrelated
                         [54]
               individuals . Currently, there are very few rare variant analysis tools that are designed to find associations
               within sequences from family studies. The sampling of relatives in sequencing studies can help one to avoid
                                           [55]
               sequencing errors in the analysis . Therefore, the Minimum P-value Optimized Nuisance parameter Score
               Test Extended to Relatives (MONSTER) was developed. MONSTER is an extension of SKAT-O and tests for
                                                                                                      [55]
               the association between rare variants and a phenotype, however, it can correlate data based on kinship . It
               combines the SKAT model with a burden test model, where depending on the dataset presented ρ will either
               be equivalent to zero, as in family-based SKAT (famSKAT), or equal to 1 as is the case with family-based
               burden test (famBT). famSKAT is a statistical strategy that uses sequence kernel association to evaluate
                                                              [56]
               rare variants in samples that contain related individuals . FamBT is a burden analysis that can be used to
               evaluate associations between rare variants and phenotypes when samples contain kin. However, MONSTER
               is capable of adaptively switching between models, performing like either famSKAT or famBT depending on
                              [55]
               the data imported .

               A particular challenge in conducting a rare variant analysis of pedigree sequencing data is identifying de
                            [57]
               novo mutations . Pedigree Variant Annotation, Analysis, and Search Tool is one of the tools that exists
               for rare variant analysis of familial data, it uses both association testing and the logarithm of odds (lod)
                                                              [57]
               scores to identify rare causal variants from familial data . Fampipe is a pipeline that can be used to analyze
               rare variant data from association studies, the pipeline can calculate identity by descent scores as well as
                                                                 [54]
               lod scores to identify regions that demonstrate association . The pipeline has several modules capable of
               calculating allelic frequency, family-specific mutations and more. To analyze binary traits in familial based
                                                                                                   [58]
               studies, the Kernel Machine Generalized Estimating Equations model (GEE-KM) was developed . The
               Rare Variant association analysis with Family data (RVFam) package for R analyzes SNP for associations
                                                                                        [59]
               with either continuous, binary, or survival phenotypes in familial sequencing studies . The family-based
                                    [60]
               association tests (FBAT)  collapse variants using the sums of allele frequencies to generate test statistics
               that are weighted. These weighted summed stats are then tested for association with phenotypes using either
               multiple regression, linear regression, or linear combination analyses. Family-based Rare Variant Association
                   [61]
               Test  is an extension of FBAT, a burden test with a variance component that can be used for rare variant
               association testing within extended families. The RVAS approaches can also be used to investigate rare and
               de novo noncoding variants in family studies. An analytical framework has been developed to investigate
                                                                                     [62]
               the de novo variations from WGS data in autism spectrum disorder (ASD) families . The SNVs and indels
               are annotated and grouped by variant type, gene, species conservation, gene set, and regulatory region.
               The number of de novo mutations located in these regions in cases was compared to the number in sibling
               controls. Burden analyses are then performed to compute the significance of these comparisons. A similar
               procedure was used to detect the associations of de novo structural variants in different annotation groups.
               The authors analyzed rare variants in 519 ASD families and did not detect the significant association
               between rare de novo mutations in non-coding regions and ASD. However, they observed some biologically
                                                                   [62]
               plausible associations that might warrant further investigation .

               TaRgeT ReseqUeNCINg Of CaNDIDaTe geNes
                                                                                            [63]
               Targeted resequencing was developed to sequence the target genes or regions of interest . The primary
               advantage of the technology is that they allow for more targeted sequencing of specific portions of the
   94   95   96   97   98   99   100   101   102   103   104