Browsing by Author "Ferrell, Barbra D."
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Improvements in viral gene annotation using large language models and soft alignments(BMC Bioinformatics, 2024-04-25) Harrigan, William L.; Ferrell, Barbra D.; Wommack, K. Eric; Polson, Shawn W.; Schreiber, Zachary D.; Belcaid, MahdiBackground The annotation of protein sequences in public databases has long posed a challenge in molecular biology. This issue is particularly acute for viral proteins, which demonstrate limited homology to known proteins when using alignment, k-mer, or profile-based homology search approaches. A novel methodology employing Large Language Models (LLMs) addresses this methodological challenge by annotating protein sequences based on embeddings. Results Central to our contribution is the soft alignment algorithm, drawing from traditional protein alignment but leveraging embedding similarity at the amino acid level to bypass the need for conventional scoring matrices. This method not only surpasses pooled embedding-based models in efficiency but also in interpretability, enabling users to easily trace homologous amino acids and delve deeper into the alignments. Far from being a black box, our approach provides transparent, BLAST-like alignment visualizations, combining traditional biological research with AI advancements to elevate protein annotation through embedding-based analysis while ensuring interpretability. Tests using the Virus Orthologous Groups and ViralZone protein databases indicated that the novel soft alignment approach recognized and annotated sequences that both blastp and pooling-based methods, which are commonly used for sequence annotation, failed to detect. Conclusion The embeddings approach shows the great potential of LLMs for enhancing protein sequence annotation, especially in viral genomics. These findings present a promising avenue for more efficient and accurate protein function inference in molecular biology.Item Soybean Bradyrhizobium spp. Spontaneously Produce Abundant and Diverse Temperate Phages in Culture(Viruses, 2024-11-07) Richards, Vanessa A.; Ferrell, Barbra D.; Polson, Shawn W.; Wommack, K. Eric; Fuhrmann, Jeffry J.Soybean bradyrhizobia (Bradyrhizobium spp.) are symbiotic root-nodulating bacteria that fix atmospheric nitrogen for the host plant. The University of Delaware Bradyrhizobium Culture Collection (UDBCC; 353 accessions) was created to study the diversity and ecology of soybean bradyrhizobia. Some UDBCC accessions produce temperate (lysogenic) bacteriophages spontaneously under routine culture conditions without chemical or other apparent inducing agents. Spontaneous phage production may promote horizontal gene transfer and shape bacterial genomes and associated phenotypes. A diverse subset (n = 98) of the UDBCC was examined for spontaneously produced virus-like particles (VLPs) using epifluorescent microscopy, with a majority (69%) producing detectable VLPs (>1 × 107 mL−1) in laboratory culture. Phages from the higher-producing accessions (>2.0 × 108 VLP mL−1; n = 44) were examined using transmission electron microscopy. Diverse morphologies were observed, including various tail types and lengths, capsid sizes and shapes, and the presence of collars or baseplates. In many instances, putative extracellular vesicles of a size similar to virions were also observed. Three of the four species examined (B. japonicum, B. elkanii, and B. diazoefficiens) produced apparently tailless phages. All species except B. ottawaense also produced siphovirus-like phages, while all but B. diazoefficiens additionally produced podovirus-like phages. Myovirus-like phages were restricted to B. japonicum and B. elkanii. At least three strains were polylysogens, producing up to three distinct morphotypes. These observations suggest spontaneously produced phages may play a significant role in the ecology and evolution of soybean bradyrhizobia.Item VIROME and metagenomes online: optimizing functionality by leveraging metadata(University of Delaware, 2015) Ferrell, Barbra D.Metagenomics has become a dominant tool for profiling the composition of microbial and viral communities, allowing inferences of taxonomic or functional composition through comparison of environmental sequences to reference databases. The power of this approach is limited when environmental proteins show no homology to reference sequences or only show homology to proteins with no known function, which may account for as much as 70% of sequences among viral samples. The Viral Informatics Resource for Metagenomic Exploration (VIROME, http://virome.dbi.udel.edu) was developed to provide functional, taxonomic, and environmental homology evidence for viral metagenomes, and to provide visualization capabilities and useful binning and comparison tools. Environmental context is provided through comparison against the Metagenomes Online (MgOl, http://metagenomesonline.org) database of predicted proteins identified from 258 microbial and viral metagenomes. MgOl libraries are manually curated with environmental metadata, providing a framework for the sequence homology results increasing the proportion of a metagenome to which meaningful context can be ascribed. This project significantly built upon the utility of VIROME and MgOl by improving the quality and consistency of the associated metadata. Metadata associated with MgOl libraries has been extensively expanded in alignment with standards such as Minimum Information about any (x) Sequence (MIxS) and Environment Ontology (EnvO). An improved VIROME sample submission portal was also designed which allows users to organize their metagenome's or viral genome's metadata in a MIxS compliant format. Users have the option to export this metadata in an output format which is compatible with Genbank BioSample submissions. Environmental metadata is further leveraged within each library through new visualizations that enhance a metagenome sequences' environmental context, and throughout VIROME through new search and comparison features allowing exploration of metagenomes with similar environmental profiles or protein homology. Through updates to the MgOl database, the VIROME library submission process, and subsequent library exploration, VIROME is able to leverage environmental annotation to provide flexible, user-driven grouping and comparison and facilitate relevant insights into sequence significance and viral community diversity.