Rnavlab 2.0: combining web applications, grid computing, and dynamic programming to overcome resource limitations in RNA secondary structure analysis
University of Delaware
As ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation, their secondary structures have been the focus of many recent studies. Despite the computing power of supercomputers, computationally predicting secondary structures with thermodynamic methods is still not feasible when the RNA molecules have long nucleotide sequences and include complex motifs such as pseudoknots. Furthermore, there is no consolidated environment for access to the several available prediction and analysis tools. In this thesis we address this problem by extending a virtual laboratory for studying RNA secondary structures, called RNA Virtual Laboratory (RNAVLab 2.0), with a Web application that allows scientists to easily and effectively access a set of heterogeneous tools for the study of secondary structures supported by heterogeneous computational resources. We design a dynamic programming algorithm for finding the optimal, non-overlapping segmentation of a long RNA sequence into segments (chunks) given a scoring function based on energy values. We integrate our algorithm into RNAVLab 2.0 to enable the prediction of the chunks independently and the generation of a complete secondary structure prediction from the combined local energy minima. We measure the prediction accuracy for the 14 longest sequences in Group A in CONTRAfold using RNAVLab 2.0 and show that 12 times out of 14 our virtual environment outperforms other methods based on global energy minima, while in the other two cases it has similar accuracy results.