Analyzing marker gene diversity using an automated phylogenetic tool: autophy

Date
2017
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Marker genes can be used to find biological insights from metagenomic data. Due to their expected conservation through evolution it is possible to identify them within a population. The most significant example is that of 16S rRNA marker gene which is widely used amongst the scientific community for bacteria. A universal marker gene for viral populations does not exist, as there is no one gene that is present on all viruses. There are a number of group specific marker genes for viruses, some of them essential for important functions such as replication. These marker genes within viral metagenomes can be analyzed to give insight into the biology of a viral assemblage; these tools are needed to automate the analysis of such data, which can help analyze the evolutionary relationship amongst viruses. In this project we develop AutoPhy, a tool developed to analyze marker gene diversity through the automated development of a phylogenetic tree from metagenomics data. This analytical tool integrates other tools and perl scripts on the shell environment, to streamline sequences chosen as marker genes of interest in metagenomes. It helps produce a tree in the newick format of the marker gene of interest from the sequences submitted. A unique suite of scripts makes up the pipeline that is involved in finding this region of interest using reference sequences, and specifically choosing the best sequences for the purpose. As a part of the study, the various scripts used in the pipeline including the trim suite were validated using mock data to check for their accuracy. In addition, the AutoPhy output was scrutinized using a manually created gold standard DNA Polymerase A tree. The tool was produces a phylogenetic tree with expected structure, but there remains potential for improvements. The output generated requires fine-tuning in terms of identification and removal of artifactual deep branches and addition of an iterative step to better align and group some sequences. With future improvements, the tool will be implemented as a part of the VIROME pipeline.
Description
Keywords
Biological sciences, Automatic, Bioinformatics, DNA polymerase, Marine, Phylogenetic trees, Virus
Citation