Quantifying functional impact of non-coding variants with multi-task Bayesian neural network

Abstract

Advances in high-throughput genotyping and sequencing technologies during recent years have revealed essential roles of non-coding regions in gene regulation. Genome-wide association studies (GWAS) suggested that a large proportion of risk variants are located in non-coding regions and remain unexplained by current expression quantitative trait loci catalogs. Interpreting the causal effects of these genetic modifications is crucial but difficult owing to our limited knowledge of how regulatory elements function. Although several computational methods have been designed to prioritize regulatory variants that substantially impact human phenotypes, few of them achieve consistently high performance even when large-scale multi-omic data are integrated. We propose a novel multi-task framework based on Bayesian deep neural networks, MtBNN, to quantify the deleterious impact of single nucleotide polymorphisms in non-coding genomic regions. With the high-efficiency provided by the multi-task Bayesian framework to integrate information from different sources, MtBNN is capable of extracting features from genomic sequences of large-scale chromatin-profiling data, such as chromatin accessibility and transcript factor binding affinities, and calculating the distribution of the probability that a non-coding variant disrupts regulatory activities. A series of comprehensive experiments show that MtBNN quantifies the functional impact of cis-regulatory variations with high accuracy, including expression quantitative trait locus, DNase I sensitivity quantitative trait locus and functional genetic variants located within ATAC-peaks that affect the accessibility of the corresponding peak and achieves significantly better performance than the existing methods. Moreover, MtBNN has applications in the discovery of potentially causal disease-associated single-nucleotide polymorphisms (SNPs), thus helping fine-map the GWAS SNPs.

Publication
Bioinformatics, 36(5), 1397-1404, 2020