Data and -omics-driven approaches to understand the heat stress response: the development of scalable tools and methods to drive hypothesis generation
University of Delaware
This dissertation concerns the broad computational challenges that face labs in the -omics era, in the service of addressing a major agricultural goal – adapting the broiler chicken to heat stress. Its contributions span creation of scalable tools to process raw sequencing reads to statistical methods that integrate multi-omics data and produce novel biological insight. I will present the paradigm for an architecture of powered-by-CyVerse tools, which is leveraged to power the tool fRNAkenseq. CyVerse is a pioneering cyberinfrastructure project to make large scale computing and storage resources accessible to domain scientists and provide a way for tools to share data with one another. fRNAkenseq, a platform for comprehensive analysis for RNA-seq from FastQ to differential expression, relies on CyVerse for cloud-based storage, a grid computing approach, and the ability to access the 30,000 reference genomes curated by the powered-by-CyVerse tool CoGe. fRNAkenseq is among the first of its kind in third party software to leverage CyVerse in such a fashion. To move from data to insight we have developed pipelines and strategies to integrate the complex, tissue rich datasets produced from fRNAkenseq with supplementary metabolomics data. From this data, we generate biological hypotheses and models that extend understanding of the regulation of the heat stress response. In particular, these hypotheses provide context for the co-regulation of sulfur, lipid, and sugar metabolism essential to maintaining homeostasis in the face of heat challenge.
Biological sciences, Data, Drive hypothesis generation, Heat stress response, Methods, Omics, Scalable tools