Elucidating aquatic virioplankton diversity and dynamics using high-throughput DNA sequencing
Date
2015
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Viral infection and lysis are important processes contributing to the
diversification and evolution of microbial communities. In marine ecosystems that
cover 70% of the earth’s surface, there are ~10 million viruses per milliliter of
seawater, indicating an incredible diversity of viruses waiting to be explored. An
important breakthrough in viral ecology was the application of high-throughput DNA
sequencing to entire viral communities. Known as shotgun viral metagenomics, this
approach allows access to the majority of viruses that cannot be maintained in culture.
Viral metagenomics has revealed surprising insight into ancient associations between
viruses and hosts. However, making quantitative inferences from next generation
sequencing data requires careful evaluation of viral isolation and DNA library
preparation techniques. Many library preparation strategies employ some form of
amplification to obtain sufficient DNA for sequencing. Biases resulting from these
techniques may alter interpretations of the occurrence and abundance of viral
populations from metagenome data. In turn, these biases may lead to
misinterpretations of viral population dynamics.
Recognizing the importance of library construction for metagenomic studies,
the biases of two commonly used technologies for preparing viral DNA for
sequencing were evaluated. The first technology evaluated, Nextera™, uses a
transposase to fragment genomic DNA followed by limited-cycle PCR to amplify
fragmented DNA while simultaneously adding appropriate barcodes and sequencing
primers. A library of a mock community comprised of genomic DNA from nine different viruses was prepared using the Nextera technology. Subsequent DNA
sequencing revealed coverage-based biases over regions of low or high genomic GC
content, likely due to the limited cycle PCR step. While the Nextera protocol may
have skewed the distribution of low and high GC phage in mixed sample
communities, the technology was sensitive enough to detect rare members in the mock
viral community, and in some cases, complete genomes were successfully
reconstructed.
Obtaining sufficient amounts of DNA for sequencing is a common challenge in
viral metagenomics. Many published techniques have used multiple displacement
amplification (MDA), employing the phiX29 polymerase, to amplify genomic DNA to
microgram quantities before proceeding with library preparation and sequencing.
Despite documented biases of this technique, MDA has been commonly used with the
assumption that pooling replicate MDA reactions of a single sample alleviates
amplification bias. To test this assumption, viral metagenome libraries were
constructed from a single mock viral community. The control (unamplified) library
was compared to libraries prepared from a single and pooled MDA treatment.
Sequence coverage of viral genomes was highly uneven in the MDA treatments
compared to the unamplified control. Strikingly, coverage patterns for the single and
pooled MDA samples were nearly identical, suggesting amplification biases are
reproducible and likely sequence-dependent. Therefore, MDA should be avoided for
any studies that aim to make quantitative inferences from mixed population samples.
Shotgun viral metagenomic sequence libraries (viromes) have revealed a
surprising diversity of novel and ancient genes within viral communities. Unlike
cellular life, there is no single gene that is universally carried within all viral genomes. However, there are many genes that occur among a wide cross-section of viruses,
including genes involved in nucleotide and protein metabolism. Chaperonins, a
conserved protein-folding system found in all cellular life, were explored in viral
metagenomic data. Contrary to the low frequency of chaperonin-encoding viruses in
sequence databases, a surprising diversity and abundance of viral chaperonins were
discovered within viromes representing a range of marine ecosystems. Viral
chaperonins were shown to be evolutionarily ancient, and are likely indispensible for
successful infection. Carrying the small cochaperonin gene, GroES, appeared to be
the most common strategy for aquatic viruses. However, populations of large genome
viruses carried complete GroES and EL operons, with the phylogeny of large-subunit
GroEL genes matching the genomic context of the GroEL gene. Archaeal versions of
chaperonins (i.e. thermosomes) were also discovered in viral metagenomes, revealing
the presence of tailed archaeal viruses in surface seawaters. The phylogenetic
resolution and conservation of GroEL and thermosome genes make them excellent
targets for studying the dynamics of large-genome viruses in aquatic environments.
Virioplankton communities turnover rapidly, less than a day in many cases.
Thus, it is surprising that so few studies have examined the variability of viral and
bacterial community diversity over diel cycles. To address this shortcoming,
variations in bacterial and viral populations in replicate mesocosms were examined
over a twenty-four hour period using marker gene sequencing. Distinct changes in
relative population abundance, indicative of Kill-the-Winner predatory-prey type
oscillations, were observed for bacterial and viral populations. Viral OTU
distributions matched predictions based on the Bank model, with few abundant viral
populations and a large fraction of rare seed-bank populations. The rank abundance of some virioplankton populations changed over the course of a few hours, likely
representing r-selected, fast-growing viruses. In contrast, rare populations emerging
from the bank were not observed. This short-term variability in the abundance of viral
and host populations is not seen over longer temporal scales such as weeks and
months, explaining the common observation of stable viral and microbial communities
within marine ecosystems. The results of this study encourage future investigations
incorporating more sampling time points over multiple diel cycles for identifying
repeating and time-lagged associations between viral and bacterial populations.