Uncovering the biological features of unknown viruses using shotgun metagenomic sequence data

Date
2017
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Viral-host interactions are critical regulators of microbial diversity and activity in the environment. Despite their importance, many aspects of environmental viral populations remain poorly understood, owing to the absence of a gene that is universally distributed among all viruses and suitable for establishing evolutionary relationships among viral groups. Shotgun viral metagenomics provides an avenue for broad sampling of viral diversity, however, binning this data is a daunting challenge given that most viral metagenome (virome) sequences show no significant homology to known viral sequences. Therefore, we created a bioinformatic pipeline (VIROME) capable of providing functional, taxonomic, and environmental contexts to shotgun virome sequences from natural communities. Using the VIROME analysis pipeline, specific associations were discovered in a deeply sampled shotgun virome. ☐ More viral DNA polymerase A (PolA) sequences were predicted from this single virome than from 85 globally distributed virome libraries (ca. 2,100 and 1,200, respectively), and clustered with the majority of global PolA peptides (ca. 70%). Long contigs containing full-length polA sequences revealed associations between phylogenetic clades of PolA and replication genes. The phylogeny of PolA provided a framework for inferring viral population biology from virome data. This framework was used to test the utility of binning virome ORFs into a biologically meaningful organization using assembly graph connections. ☐ Here, we describe strategies to better organize, and build upon, our current knowledge of viral diversity and ecology. Building on the PolA framework and assembly graph binning strategy a "field guide" of viral genomics can be created to organize the current knowledge of gene-gene and gene-feature associations in known viral genomes. Still, choosing which viral genes and features to explore is critical. By observing the CRISPR system (a prokaryotic acquired immune system against viruses) in natural environments, we demonstrate its potential as a "genetic algorithm", targeting viral genes that are under more stringent selective pressures. This is evidence that environmental CRISPR spacers can be used as a genetic `wind vane' pointing investigators toward viral genes that may be interesting to study, as they are critical to the viral lifecycle.
Description
Keywords
Citation