A hybrid approach to building gene regulatory networks with Bayesian inference

Sari, Alparslan
Journal Title
Journal ISSN
Volume Title
University of Delaware
Gene regulation plays a central role in cell biology. High throughput technologies, such as DNA microarray and next generation sequencing, enable measurement of gene expression at large scale, and this makes it possible to study gene regulation at the network level. Over the past decade, many computational methods have been developed to analyze the colossal amounts of gene expression data to infer gene regulatory network. The recent discovery of microRNA as regulatory elements represents opportunities and challenges to re-examine gene regulation. In this study, we developed a hybrid approach to construct regulatory gene network by incorporating prior knowledge including microRNAs as regulatory elemets. The hybrid approach, with use of Bayesian networks, combines learning without prior knowledge and using a predefined partial network to start the learning process in order to build a well-defined, more complete regulatory network. Existing methods either learn a network from scratch or use a predefined/complete network to just learn network's parameters. We used predefined partial networks and other prior knowledge (protein-protein interactions and transcription factor information) as constraints, and used a Bayesian network to infer new edges for a more complete network. Specifically, we used KEGG pathways as our initial network. With this new approach, we generated gene networks from raw data based on the initial network, and we are expecting to exploit potential cross-talks between pathways. We implemented the system pipeline in Python, which consists of four major steps: parser for KEGG pathways, network initializer, Banjo for bayesian networks, and evaluator. The gene expression data was collected from 10 published studies on breast cancer involving Estrogen receptor ER+ and ER-, and KEGG pathways are signaling pathways downloaded from KEGG database. The results from multiple experiments under various different conditions validate our hypothesis regarding the use of prior knowledge. The performance of the generated networks as measured by precision and recall, with the KEGG networks as ground truth, increases and then either peaks or plateaus as the amount of prior knowledge increases. The best performance was reported with 65% recall and 95% precision when 20% of KEGG edges was used for initial partial network with about 90% edges restriction. While de novo edges are counted as false positives in the current evaluation scheme, they may contain potential crosstalk between known pathways, which need to be validated experimentally. As future work, we will develop weighting mechanisms to generate consensus network from top-ranking predicted networks and refined literature search for prior knowledge to complement KEGG database.