Page 79 - Read Online
P. 79

Page 148                                   Goodman et al. J Transl Genet Genom 2020;4:144-58  I  http://dx.doi.org/10.20517/jtgg.2020.23

               (TCAG), Hospital for Sick Children Research Institute, Toronto, Ontario, Canada in accordance with
               the manufacturer’s protocols. Samples were randomly stratified across chips and run in two batches but
               balanced for case/control proportions and sex.

                                                                                  [27]
               Raw data were then processed in R statistical software, using the package minfi . Quality control measures
               included removing probes that failed detection P-value, meaning the signal was not significantly above
               background noise, as well as probes mapping to X and Y chromosomes, cross-reactive probes and SNP
                                                                                                       [11]
               probes [28,29] . All criteria and methods for pre-processing are fully described in Chater-Diehl et al. .
               Following these steps, data underwent background signal subtraction and control normalization also using
                    [29]
               minfi . The normalized data consisted of 774,583 methylation sites or CpGs for each sample. DNAm,
               measured in b values, ranges 0-1 representing percent methylation.

               DNAm signature derivation
               Prior to statistical analysis, underlying proportions of monocytes, neutrophils, CD4T, CD8T, natural killer
                                                                                           [30]
               cells and B cells were estimated from the DNAm data using the Houseman algorithm . At each CpG
               site, a two-group comparison of KS discovery cases vs. controls was performed using limma regression,
                                                                                   [31]
               accounting for sex, age, batch and estimated blood cell proportion covariates . CpG sites found to be
               differentially methylated between cases and controls were reported if they met both a statistical significance
               [false discovery rate (FDR)-corrected P-value < 0.01] and a minimum effect size (absolute Δb >10%). Δb
               represents the difference in average DNAm (b) between groups. Principal component analysis (PCA) and
               hierarchical clustering were generated using Qlucore Omics Explorer (QOE, www.qlucore.com).


               SVM model classification
               Statistically significant CpG sites, i.e., the DNAm signature, were used as input into a machine-learning
               algorithm, support vector machine (SVM), to generate a predictive classification model. To remove noise
               and to filter out information that did not improve the efficacy of the model, we first removed redundant
               sites. Any methylation site that was highly correlated (r > 0.9) with any other site was removed, leaving
               429 CpG sites. We then built an SVM model using the R package caret (for details of model training and
                                        [10] [32]
               validation, see Butcher et al. ) . The classification model generated by SVM was then applied to all
               remaining samples. The output of this model was a probability score indicating likelihood of having KS or a
               genomic alteration that causes KS.


               GO analysis
               Gene ontology (GO) enrichment analysis was performed on the KS signature sites using GREAT (Genomic
                                                    [33]
               Regions Enrichment of Annotations Tool) . We used a custom “background” that included all 774,583
               CpG sites that passed quality control. “Basal+extension” was used to identify associated genes, using
               the following modified parameters: constitutive 5.0 kb upstream and 1.0 kb downstream, up to 10.0 kb
               maximum extension. We also refined the output by requiring that significant terms contain two or more
               gene hits.

               RESULTS
               Identifying a DNA methylation signature for Kleefstra syndrome
               To define a DNAm signature associated with KS, DNA from KS patients and neurotypical controls was
               extracted from blood and assayed using the EPIC array, generating high-quality measurements at 774,583
               CpG sites. Ten unrelated individuals with a confirmed clinical diagnosis of KS, samples KS1_T - KS10_T,
               and pathogenic variants in EHMT1 or microdeletions of 9q34.3, which included partial or full deletions
               of EHMT1 (n = 3 and n = 7, respectively; n = 6 females; age 1-25 years) were compared to 42 neurotypical
               controls (n = 21 females; age 1-28 years). Since we combined data from patients with pathogenic variants in
               EHMT1 and those with 9q34.3 microdeletions together, our analyses identified DNAm changes common to
   74   75   76   77   78   79   80   81   82   83   84