Rnavlab 2.0: combining web applications, grid computing, and dynamic programming to overcome resource limitations in RNA secondary structure analysis
Date
2010
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
As ribonucleic acid (RNA) molecules play important roles in many biological
processes including gene expression and regulation, their secondary structures have
been the focus of many recent studies. Despite the computing power of supercomputers,
computationally predicting secondary structures with thermodynamic methods
is still not feasible when the RNA molecules have long nucleotide sequences and
include complex motifs such as pseudoknots. Furthermore, there is no consolidated
environment for access to the several available prediction and analysis tools.
In this thesis we address this problem by extending a virtual laboratory for
studying RNA secondary structures, called RNA Virtual Laboratory (RNAVLab
2.0), with a Web application that allows scientists to easily and effectively access a
set of heterogeneous tools for the study of secondary structures supported by heterogeneous
computational resources. We design a dynamic programming algorithm
for finding the optimal, non-overlapping segmentation of a long RNA sequence into
segments (chunks) given a scoring function based on energy values. We integrate our
algorithm into RNAVLab 2.0 to enable the prediction of the chunks independently
and the generation of a complete secondary structure prediction from the combined
local energy minima. We measure the prediction accuracy for the 14 longest sequences
in Group A in CONTRAfold using RNAVLab 2.0 and show that 12 times
out of 14 our virtual environment outperforms other methods based on global energy
minima, while in the other two cases it has similar accuracy results.