Open Access Publications

Permanent URI for this collection

Open access publications by faculty, staff, postdocs, and graduate students at the Center for Bioinformatics and Computational Biology.

Browse

Recent Submissions

Now showing 1 - 20 of 33
  • Item
    Uncovering key predictive channels and clinical variables in the gamma band auditory steady-state response in early-stage psychosis: a longitudinal study
    (Acta Neuropsychiatrica, 2024-12-09) Holton, Kristina M.; Higgins, Amy; Brockmeier, Austin J.; Hall, Mei-Hua
    Objective: Psychotic disorders are characterised by abnormalities in the synchronisation of neuronal responses. A 40 Hz gamma band deficit during auditory steady-state response (ASSR) measured by electroencephalogram (EEG) is a robust observation in psychosis and is associated with symptoms and functional deficits. However, the majority of ASSR studies focus on specific electrode sites, while whole scalp analysis using all channels, and the association with clinical symptoms, are rare. Methods: In this study, we use whole-scalp 40 Hz ASSR EEG measurements – power and phase-locking factor – to establish deficits in early-stage psychosis (ESP) subjects, classify ESP status using an ensemble of machine learning techniques, identify correlates with principal components obtained from clinical/demographic/functioning variables, and correlate functional outcome after a short-term follow-up. Results: We identified significant spatially-distributed group level differences for power and phase locking. The performance of different machine learning techniques and interpretation of the extracted feature importance indicate that phase locking has a more predictive and parsimonious pattern than power. Phase locking is also associated with principal components composed of measures of cognitive processes. Short-term functional outcome is associated with baseline 40 Hz ASSR signals from the FCz and other channels in both phase locking and power. Conclusion: This whole-scalp EEG study provides additional evidence to link deficits in 40 Hz ASSRs with cognition and functioning in ESP, and corroborates with prior studies of phase locking from a subset of EEG channels. Confirming 40 Hz ASSR deficits serves as a candidate phenotype to identify circuit dysfunctions and a biomarker for clinical outcomes in psychosis.
  • Item
    Understanding the liver under heat stress with statistical learning: an integrated metabolomics and transcriptomics computational approach
    (BMC Genomics, 2019-06-17) Hubbard, Allen H.; Zhang, Xiaoke; Jastrebski, Sara; Singh, Abhyudai; Schmidt, Carl J.
    Background We present results from a computational analysis developed to integrate transcriptome and metabolomic data in order to explore the heat stress response in the liver of the modern broiler chicken. Heat stress is a significant cause of productivity loss in the poultry industry, both in terms of increased livestock morbidity and its negative influence on average feed efficiency. This study focuses on the liver because it is an important regulator of metabolism, controlling many of the physiological processes impacted by prolonged heat stress. Using statistical learning methods, we identify genes and metabolites that may regulate the heat stress response in the liver and adaptations required to acclimate to prolonged heat stress. Results We describe how disparate systems such as sugar, lipid and amino acid metabolism, are coordinated during the heat stress response. Conclusions Our findings provide more detailed context for genomic studies and generates hypotheses about dietary interventions that can mitigate the negative influence of heat stress on the poultry industry.
  • Item
    Transcriptomic and metabolomic characterization of post-hatch metabolic reprogramming during hepatic development in the chicken
    (BMC Genomics, 2021-05-24) Van Every, Heidi A.; Schmidt, Carl J.
    Background Artificial selection of modern meat-producing chickens (broilers) for production characteristics has led to dramatic changes in phenotype, yet the impact of this selection on metabolic and molecular mechanisms is poorly understood. The first 3 weeks post-hatch represent a critical period of adjustment, during which the yolk lipid is depleted and the bird transitions to reliance on a carbohydrate-rich diet. As the liver is the major organ involved in macronutrient metabolism and nutrient allocatytion, a combined transcriptomics and metabolomics approach has been used to evaluate hepatic metabolic reprogramming between Day 4 (D4) and Day 20 (D20) post-hatch. Results Many transcripts and metabolites involved in metabolic pathways differed in their abundance between D4 and D20, representing different stages of metabolism that are enhanced or diminished. For example, at D20 the first stage of glycolysis that utilizes ATP to store or release glucose is enhanced, while at D4, the ATP-generating phase is enhanced to provide energy for rapid cellular proliferation at this time point. This work has also identified several metabolites, including citrate, phosphoenolpyruvate, and glycerol, that appear to play pivotal roles in this reprogramming. Conclusions At Day 4, metabolic flexibility allows for efficiency to meet the demands of rapid liver growth under oxygen-limiting conditions. At Day 20, the liver’s metabolism has shifted to process a carbohydrate-rich diet that supports the rapid overall growth of the modern broiler. Characterizing these metabolic changes associated with normal post-hatch hepatic development has generated testable hypotheses about the involvement of specific genes and metabolites, clarified the importance of hypoxia to rapid organ growth, and contributed to our understanding of the molecular changes affected by decades of artificial selection.
  • Item
    Building a multistate model from electronic health records data for modeling long-term diabetes complications
    (Journal of Clinical and Translational Science, 2024-09-23) Li, Riza C.; Ding, Shanshan; Ndura, Kevin; Patel, Vishal; Jurkovitz, Claudine
    Objective: The progression of long-term diabetes complications has led to a decreased quality of life. Our objective was to evaluate the adverse outcomes associated with diabetes based on a patient’s clinical profile by utilizing a multistate modeling approach. Methods: This was a retrospective study of diabetes patients seen in primary care practices from 2013 to 2017. We implemented a five-state model to examine the progression of patients transitioning from one complication to having multiple complications. Our model incorporated high dimensional covariates from multisource data to investigate the possible effects of different types of factors that are associated with the progression of diabetes. Results: The cohort consisted of 10,596 patients diagnosed with diabetes and no previous complications associated with the disease. Most of the patients in our study were female, White, and had type 2 diabetes. During our study period, 5928 did not develop complications, 3323 developed microvascular complications, 1313 developed macrovascular complications, and 1129 developed both micro- and macrovascular complications. From our model, we determined that patients had a 0.1334 [0.1284, .1386] rate of developing a microvascular complication compared to 0.0508 [0.0479, .0540] rate of developing a macrovascular complication. The area deprivation index score we incorporated as a proxy for socioeconomic information indicated that patients who reside in more disadvantaged areas have a higher rate of developing a complication compared to those who reside in least disadvantaged areas. Conclusions: Our work demonstrates how a multistate modeling framework is a comprehensive approach to analyzing the progression of long-term complications associated with diabetes.
  • Item
    Sequestration of gene products by decoys enhances precision in the timing of intracellular events
    (Scientific Reports, 2024-11-08) Biswas, Kuheli; Dey, Supravat; Singh, Abhyudai
    Expressed gene products often interact ubiquitously with binding sites at nucleic acids and macromolecular complexes, known as decoys. The binding of transcription factors (TFs) to decoys can be crucial in controlling the stochastic dynamics of gene expression. Here, we explore the impact of decoys on the timing of intracellular events, as captured by the time taken for the levels of a given TF to reach a critical threshold level, known as the first passage time (FPT). Although nonlinearity introduced by binding makes exact mathematical analysis challenging, employing suitable approximations and reformulating FPT in terms of an alternative variable, we analytically assess the impact of decoys. The stability of the decoy-bound TFs against degradation impacts FPT statistics crucially. Decoys reduce noise in FPT, and stable decoy-bound TFs offer greater timing precision with less expression cost than their unstable counterparts. Interestingly, when both bound and free TFs decay at the same rate, decoy binding does not directly alter FPT noise. We verify these results by performing exact stochastic simulations. These results have important implications for the precise temporal scheduling of events involved in the functioning of biomolecular clocks, development processes, cell-cycle control, and cell-size homeostasis.
  • Item
    A long-term high-fat diet induces differential gene expression changes in spatially distinct adipose tissue of male mice
    (Physiological Genomics, 2024-11-11) Alradi, Malak; Askari, Hassan; Shaw, Mark; Bhavsar, Jaysheel D.; Kingham, Brewster F.; Polson, Shawn W.; Fancher, Ibra S.
    The accumulation of visceral adipose tissue (VAT) is strongly associated with cardiovascular disease and diabetes. In contrast, individuals with increased subcutaneous adipose tissue (SAT) without corresponding increases in VAT are associated with a metabolic healthy obese phenotype. These observations implicate dysfunctional VAT as a driver of disease processes, warranting investigation into obesity-induced alterations of distinct adipose depots. To determine the effects of obesity on adipose gene expression, male mice (n = 4) were fed a high-fat diet to induce obesity or a normal laboratory diet (lean controls) for 12–14 mo. Mesenteric VAT and inguinal SAT were isolated for bulk RNA sequencing. AT from lean controls served as a reference to obesity-induced changes. The long-term high-fat diet induced the expression of 169 and 814 unique genes in SAT and VAT, respectively. SAT from obese mice exhibited 308 differentially expressed genes (164 upregulated and 144 downregulated). VAT from obese mice exhibited 690 differentially expressed genes (262 genes upregulated and 428 downregulated). KEGG pathway and GO analyses revealed that metabolic pathways were upregulated in SAT versus downregulated in VAT while inflammatory signaling was upregulated in VAT. We next determined common genes that were differentially regulated between SAT and VAT in response to obesity and identified four genes that exhibited this profile: elovl6 and kcnj15 were upregulated in SAT/downregulated in VAT while trdn and hspb7 were downregulated in SAT/upregulated in VAT. We propose that these genes in particular should be further pursued to determine their roles in SAT versus VAT with respect to obesity. NEW & NOTEWORTHY A long-term high-fat diet induced the expression of more than 980 unique genes across subcutaneous adipose tissue (SAT) and visceral adipose tissue (VAT). The high-fat diet also induced the differential expression of nearly 1,000 AT genes. We identified four genes that were oppositely expressed in SAT versus VAT in response to the high-fat diet and propose that these genes in particular may serve as promising targets aimed at resolving VAT dysfunction in obesity.
  • Item
    A Moments-Based Analytical Approach for Cell Size Homeostasis
    (IEEE Control Systems Letters, 2024-06-07) Nieto, César; Vargas-Garcia, Cesar Augusto; Singh, Abhyudai
    This contribution explores mechanisms that regulate the dynamics of single-cell size, maintaining equilibrium around a target set point. Using the formalism of Stochastic Hybrid Systems (SHS), we consider continuous exponential growth in cell size (as determined by volume/mass/surface area). This continuous-time evolution is interspersed by cell division events that occur randomly as per a given size-dependent rate, and upon division, only one of the two daughter cells is tracked. We show that a size-independent division rate does not provide cell size homeostasis, in the sense that the variance in cell size increases unboundedly over time. Next, we consider a division rate proportional to cell size that yields the adder size control observed in several bacteria – a constant size is added on average between birth and division regardless of the newborn size. For this scenario, we obtain exact formulas for the steady-state moments (mean, variance, and skewness) of cell size. Expanding the SHS model, we explore a biologically relevant scenario where the time between successive division events is further divided into multiple discrete stages with size-dependent stage transitions. Exact moment computations demonstrate that increasing the number of stages reduces cell size variability (noise). We also find formulas considering uneven size partitioning between daughters during division, and where the division rate follows a power law of the cell size leading to deviations from adder size control. This letter provides a method for estimating model parameters from observed cell size distributions and enhances our understanding of mechanisms underlying cell size regulation.
  • Item
    Efficacy of Bacillus subtilis probiotic in preventing necrotic enteritis in broilers: a systematic review and meta-analysis
    (Avian Pathology, 2024-07-03) Ghimire, Shweta; Subedi, Keshab; Zhangb, Xinwen; Wu, Changqing
    Probiotics can enhance broiler chicken health by improving intestinal microbiota, potentially replacing antibiotics. They protect against bacterial diseases like necrotic enteritis (NE) in poultry. Understanding their role is crucial for managing bacterial diseases, including NE. This study conducted a meta-analysis to assess the effects of Bacillus subtilis probiotic supplementation on feed conversion ratio (FCR), NE lesion score, and mortality. Additionally, a systematic review analysed gut microbiota changes in broilers challenged with Clostridium perfringens with or without the probiotic supplementation. Effect sizes from the studies were estimated in terms of standardized mean difference (SMD). Random effect models were fitted to estimate the pooled effect size and 95% confidence interval (CI) of the pooled effect size between the control [probiotic-free + C. perfringens] and the treatment [Bacillus subtilis supplemented + C. perfringens] groups. Overall variance was computed by heterogeneity (Q). The meta-analysis showed that Bacillus subtilis probiotic supplementation significantly improved FCR and reduced NE lesion score but had no effect on mortality rates. The estimated overall effects of probiotic supplementation on FCR, NE lesion score and mortality percentage in terms of SMD were −0.91 (CI = −1.34, −0.49; P < 0.001*); −0.67 (CI = −1.11, −0.22; P = 0.006*), and −0.32 (CI = −0.70, 0.06; P = 0.08), respectively. Heterogeneity analysis indicated significant variations across studies for FCR (Q = 69.66; P < 0.001*) and NE lesion score (Q = 42.35; P < 0.001*) while heterogeneity was not significant for mortality (Q = 2.72; P = 0.74). Bacillus subtilis probiotic supplementation enriched specific gut microbiota including Streptococcus, Butyricicoccus, Faecalibacterium, and Ruminococcus. These microbiotas were found to upregulate expression of various genes such as TJ proteins occluding, ZO-1, junctional adhesion 2 (JAM2), interferon gamma, IL12-β and transforming growth factor-β4. Moreover, downregulated mucin-2 expression was involved in restoring the intestinal physical barrier, reducing intestinal inflammation, and recovering the physiological functions of damaged intestines. These findings highlight the potential benefits of probiotic supplementation in poultry management, particularly in combating bacterial diseases and promoting intestinal health.
  • Item
    Morphology, composition, and deterioration of the embryonic rostral sheath of the smalltooth sawfish (Pristis pectinata)
    (Fishery Bulletin, 2024-05-28) Poulakis, Gregg R.; Wyffels, Jennifer T.; Fortman, P. Eric; Wooley, Andrew K.; Heath, Lukas B.; Yakich, Dylan M.; Wilson, Patrick W.
    Elongated rostra evolved in diverse animal groups as adaptations for feeding, defense, sensory perception, and reproduction. Sawfish rostra have tooth-like dermal denticles, referred to as rostral teeth, along their lateral margins. Embryos have a sheath, or covering, for the calcified rostral teeth during gestation, and it persists until after parturition. Little is known about the morphology and composition of the sheath. During 18 years of tagging juvenile smalltooth sawfish (Pristis pectinata), sheaths were documented for 36 neonates with stretch total lengths of 581–812 mm, and samples were collected from 6 specimens for laboratory evaluation. The multilayered, skin-like sheath, which cannot be easily removed manually, has a vascularized inner layer of connective tissue composed primarily of fibrous proteins (e.g., collagen, reticulin, and keratin) surrounded by an outer layer of columnar and spherical epithelial cells overlying a basement membrane. The columnar cells contain condensed chromatin and differentiate into the outermost spherical cells that contain carbohydrates. After birth, the sheath is shed evenly over 4 d, through sloughing and apoptosis, fully exposing the rostral teeth. The sheath is an ephemeral embryonic organ that protects the female and the embryos from injury during gestation and birth. Resumen Los rostros alargados evolucionaron en diversos grupos de animales como adaptaciones para la alimentación, la defensa, la percepción sensorial y la reproducción. Los rostros de los peces sierra tienen a lo largo de sus márgenes laterales, dentículos dérmicos, denominados dientes rostrales. Los embriones tienen una cubierta o envoltura para los dientes rostrales calcificados durante la gestación, que persiste hasta después del parto. Se sabe poco sobre la morfología y composición de la cubierta. Durante 18 años de marcado de juveniles de pez sierra peine (Pristis pectinata), se documentaron las cubiertas de 36 neonatos con longitudes totales estiradas de 581-812 mm, y se recogieron muestras de 6 especímenes para su evaluación en laboratorio. La cubierta multicapa, similar a la piel, la cual no puede extraerse manualmente con facilidad, tiene una capa interna vascularizada de tejido conectivo compuesta principalmente de proteínas fibrosas (ej. colágeno, reticulina y queratina) rodeada por una capa externa de células epiteliales columnares y esféricas que recubren una membrana basal. Las células columnares contienen cromatina condensada y se diferencian en las células esféricas más externas que contienen carbohidratos. Tras el nacimiento, la cubierta se desprende uniformemente a lo largo de 4 días, mediante descamación y apoptosis, exponiendo completamente los dientes rostrales. La cubierta es un órgano embrionario efímero que protege a la hembra y los embriones de lesiones durante la gestación y el nacimiento.
  • Item
    A short-term, randomized, controlled, feasibility study of the effects of different vegetables on the gut microbiota and microRNA expression in infants
    (Frontiers in Microbiomes, 2024-03-01) Ferro, Lynn E.; Bittinger, Kyle; Trudo, Sabrina P.; Beane, Kaleigh E.; Polson, Shawn W.; Kim, Jae Kyeom; Trabulsi, Jillian C.
    The complementary diet influences the gastrointestinal (gut) microbiota composition and, in turn, host health and, potentially, microRNA (miRNA) expression. This study aimed to assess the feasibility of altering the gut microbial communities with short-term food introduction and to determine the effects of different vegetables on the gut microbiota and miRNA expression in infants. A total of 11 infants were randomized to one of the following intervention arms: control, broccoli, or carrot. The control group maintained the milk diet only, while the other groups consumed either a broccoli puree or a carrot puree on days 1–3 along with their milk diet (human milk or infant formula). Genomic DNA and total RNA were extracted from fecal samples to determine the microbiota composition and miRNA expression. Short-term feeding of both broccoli and carrots resulted in changes in the microbiota and miRNA expression. Compared to the control, a trend toward a decrease in Shannon index was observed in the carrot group on days 2 and 4. The carrot and broccoli groups differed by weighted UniFrac. Streptococcus was increased on day 4 in the carrot group compared to the control. The expression of two miRNAs (i.e., miR-217 and miR-590-5p) trended towards decrease in both the broccoli and carrot groups compared to the control, whereas increases in eight and two different miRNAs were observed in the carrot and broccoli groups, respectively. Vegetable interventions differentially impacted the gut microbiota and miRNA expression, which may be a mechanism by which total vegetable intake and variety are associated with reduced disease risk.
  • Item
    Transcriptional regulation of Sis1 promotes fitness but not feedback in the heat shock response
    (eLife, 2023-05-17) Grade, Rania; Singh, Abhyudai; Ali, Asif; Pincus, David
    The heat shock response (HSR) controls expression of molecular chaperones to maintain protein homeostasis. Previously, we proposed a feedback loop model of the HSR in which heat-denatured proteins sequester the chaperone Hsp70 to activate the HSR, and subsequent induction of Hsp70 deactivates the HSR (Krakowiak et al., 2018; Zheng et al., 2016). However, recent work has implicated newly synthesized proteins (NSPs) – rather than unfolded mature proteins – and the Hsp70 co-chaperone Sis1 in HSR regulation, yet their contributions to HSR dynamics have not been determined. Here, we generate a new mathematical model that incorporates NSPs and Sis1 into the HSR activation mechanism, and we perform genetic decoupling and pulse-labeling experiments to demonstrate that Sis1 induction is dispensable for HSR deactivation. Rather than providing negative feedback to the HSR, transcriptional regulation of Sis1 by Hsf1 promotes fitness by coordinating stress granules and carbon metabolism. These results support an overall model in which NSPs signal the HSR by sequestering Sis1 and Hsp70, while induction of Hsp70 – but not Sis1 – attenuates the response.
  • Item
    Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life
    (Scientific Reports, 2023-02-06) Hallee, Logan; Khomtchouk, Bohdan B.
    In this study, we investigate how an organism’s codon usage bias can serve as a predictor and classifier of various genomic and evolutionary traits across the domains of life. We perform secondary analysis of existing genetic datasets to build several AI/machine learning models. When trained on codon usage patterns of nearly 13,000 organisms, our models accurately predict the organelle of origin and taxonomic identity of nucleotide samples. We extend our analysis to identify the most influential codons for phylogenetic prediction with a custom feature ranking ensemble. Our results suggest that the genetic code can be utilized to train accurate classifiers of taxonomic and phylogenetic features. We then apply this classification framework to open reading frame (ORF) detection. Our statistical model assesses all possible ORFs in a nucleotide sample and rejects or deems them plausible based on the codon usage distribution. Our dataset and analyses are made publicly available on GitHub and the UCI ML Repository to facilitate open-source reproducibility and community engagement.
  • Item
    Transcriptomic Signature of the Simulated Microgravity Response in Caenorhabditis elegans and Comparison to Spaceflight Experiments
    (Cells, 2023-01-10) Çelen, İrem; Jayasinghe, Aroshan; Doh, Jung H.; Sabanayagam, Chandran R.
    Given the growing interest in human exploration of space, it is crucial to identify the effects of space conditions on biological processes. Here, we analyze the transcriptomic response of Caenorhabditis elegans to simulated microgravity and observe the maintained transcriptomic response after returning to ground conditions for four, eight, and twelve days. We show that 75% of the simulated microgravity-induced changes on gene expression persist after returning to ground conditions for four days while most of these changes are reverted after twelve days. Our results from integrative RNA-seq and mass spectrometry analyses suggest that simulated microgravity affects longevity-regulating insulin/IGF-1 and sphingolipid signaling pathways. Finally, we identified 118 genes that are commonly differentially expressed in simulated microgravity- and space-exposed worms. Overall, this work provides insight into the effect of microgravity on biological systems during and after exposure.
  • Item
    DNA Methylation Analysis Reveals Distinct Patterns in Satellite Cell–Derived Myogenic Progenitor Cells of Subjects with Spastic Cerebral Palsy
    (Journal of Personalized Medicine, 2022-11-30) Robinson, Karyn G.; Marsh, Adam G.; Lee, Stephanie K.; Hicks, Jonathan; Romero, Brigette; Batish, Mona; Crowgey, Erin L.; Shrader, M. Wade; Akins, Robert E.
    Spastic type cerebral palsy (CP) is a complex neuromuscular disorder that involves altered skeletal muscle microanatomy and growth, but little is known about the mechanisms contributing to muscle pathophysiology and dysfunction. Traditional genomic approaches have provided limited insight regarding disease onset and severity, but recent epigenomic studies indicate that DNA methylation patterns can be altered in CP. Here, we examined whether a diagnosis of spastic CP is associated with intrinsic DNA methylation differences in myoblasts and myotubes derived from muscle resident stem cell populations (satellite cells; SCs). Twelve subjects were enrolled (6 CP; 6 control) with informed consent/assent. Skeletal muscle biopsies were obtained during orthopedic surgeries, and SCs were isolated and cultured to establish patient–specific myoblast cell lines capable of proliferation and differentiation in culture. DNA methylation analyses indicated significant differences at 525 individual CpG sites in proliferating SC–derived myoblasts (MB) and 1774 CpG sites in differentiating SC–derived myotubes (MT). Of these, 79 CpG sites were common in both culture types. The distribution of differentially methylated 1 Mbp chromosomal segments indicated distinct regional hypo– and hyper–methylation patterns, and significant enrichment of differentially methylated sites on chromosomes 12, 13, 14, 15, 18, and 20. Average methylation load across 2000 bp regions flanking transcriptional start sites was significantly different in 3 genes in MBs, and 10 genes in MTs. SC derived MBs isolated from study participants with spastic CP exhibited fundamental differences in DNA methylation compared to controls at multiple levels of organization that may reveal new targets for studies of mechanisms contributing to muscle dysregulation in spastic CP.
  • Item
    Whole-genome sequencing identifies I-SceI-mediated transgene integration sites in Xenopus tropicalis snai2: eGFP line
    (G3: Genes | Genomes | Genetics, 2022-02-16) Wang, Jian; Lu, Congyu; Wei, Shuo
    Transgenesis with the meganuclease I-SceI is a safe and efficient method, but the underlying mechanisms remain unclear due to the lack of information on transgene localization. Using I-SceI, we previously developed a transgenic Xenopus tropicalis line expressing enhanced green fluorescent protein driven by the neural crest-specific snai2 promoter/enhancer, which is a powerful tool for studying neural crest development and craniofacial morphogenesis. Here we carried out whole-genome shotgun sequencing for the snai2: eGFP embryos to identify the transgene integration sites. With a 19x sequencing coverage, we estimated that 6 copies of the transgene were inserted into the X. tropicalis genome in the hemizygous transgenic embryos. Two transgene integration loci adjacent to each other were identified in a non-coding region on Chromosome 1, possibly as a result of duplication after a single transgene insertion. Interestingly, genomic DNA at the boundaries of the transgene integration loci contains short sequences homologous to the I-SceI recognition site, suggesting that the integration was not random but probably mediated by sequence homology. To our knowledge, our work represents the first genome-wide sequencing study on a transgenic organism generated with I-SceI, which is useful for evaluating the potential genetic effects of I-SceI-mediated transgenesis and further understanding the mechanisms underlying this transgenic method.
  • Item
    Protein Ontology (PRO): enhancing and scaling up the representation of protein entities
    (Oxford University Press, 2016-11-28) Natale, Darren A.; Arighi, Cecilia N.; Blake, Judith A.; Bona, Jonathan; Chen, Chuming; Chen, Sheng-Chih; Christie, Karen R.; Cowart, Julie; D’Eustachio, Peter; Diehl, Alexander D.; Drabkin, Harold J.; Duncan, William D.; Huang, Hongzhan; Ren, Jia; Ross, Karen; Ruttenberg, Alan; Shamovsky, Veronica; Smith, Barry; Wang, Qinghua; Zhang, Jian; El-Sayed, Abdelrahman; Wu, Cathy H.; Darren A. Natale, Cecilia N. Arighi, Judith A. Blake, Jonathan Bona, Chuming Chen, Sheng-Chih Chen, Karen R. Christie, Julie Cowart, Peter D’Eustachio, Alexander D. Diehl, Harold J. Drabkin, William D. Duncan, Hongzhan Huang, Jia Ren, Karen Ross, Alan Ruttenberg, Veronica Shamovsky, Barry Smith, Qinghua Wang, Jian Zhang, Abdelrahman El-Sayed and Cathy H. Wu; Arighi, Cecilia N.; Chen, Chuming; Chen, Sheng-Chih; Cowart, Julie; Huang, Hongzhan; Ren, Jia; Wang, Qinghua; Wu, Cathy H.
    The Protein Ontology (PRO; http://purl.obolibrary. org/obo/pr) formally defines and describes taxonspecific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and proteincontaining complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translationalmodification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities.
  • Item
    ThermoAlign: a genome-aware primer design tool for tiled amplicon resequencing
    (Nature Publishing Group, 2017-03-16) Francis, Felix; Dumas, Michael D.; Wisser, Randall J.; Felix Francis, Michael D. Dumas & Randall J. Wisser; Francis, Felix; Dumas, Michael D.; Wisser, Randall J.
    Isolating and sequencing specific regions in a genome is a cornerstone of molecular biology. This has been facilitated by computationally encoding the thermodynamics of DNA hybridization for automated design of hybridization and priming oligonucleotides. However, the repetitive composition of genomes challenges the identification of target-specific oligonucleotides, which limits genetics and genomics research on many species. Here, a tool called ThermoAlign was developed that ensures the design of target-specific primer pairs for DNA amplification. This is achieved by evaluating the thermodynamics of hybridization for full-length oligonucleotide-template alignments — thermoalignments — across the genome to identify primers predicted to bind specifically to the target site. For amplificationbased resequencing of regions that cannot be amplified by a single primer pair, a directed graph analysis method is used to identify minimum amplicon tiling paths. Laboratory validation by standard and long-range polymerase chain reaction and amplicon resequencing with maize, one of the most repetitive genomes sequenced to date (≈85% repeat content), demonstrated the specificity-by-design functionality of ThermoAlign. ThermoAlign is released under an open source license and bundled in a dependency-free container for wide distribution. It is anticipated that this tool will facilitate multiple applications in genetics and genomics and be useful in the workflow of high-throughput targeted resequencing studies.
  • Item
    ThermoAlign: a genome-aware primer design tool for tiled amplicon resequencing
    (Nature Publishing Group, 2017-03-16) Francis, Felix; Dumas, Michael D.; Wisser, RJ; Felix Francis, Michael D. Dumas and Randall J.Wisser; Wisser, Randall Jerome
    Isolating and sequencing specific regions in a genome is a cornerstone of molecular biology. This has been facilitated by computationally encoding the thermodynamics of DNA hybridization for automated design of hybridization and priming oligonucleotides. However, the repetitive composition of genomes challenges the identification of target-specific oligonucleotides, which limits genetics and genomics research on many species. Here, a tool called ThermoAlign was developed that ensures the design of target-specific primer pairs for DNA amplification. This is achieved by evaluating the thermodynamics of hybridization for full-length oligonucleotide-template alignments — thermoalignments — across the genome to identify primers predicted to bind specifically to the target site. For amplificationbased resequencing of regions that cannot be amplified by a single primer pair, a directed graph analysis method is used to identify minimum amplicon tiling paths. Laboratory validation by standard and long-range polymerase chain reaction and amplicon resequencing with maize, one of the most repetitive genomes sequenced to date (≈85% repeat content), demonstrated the specificity-by-design functionality of ThermoAlign. ThermoAlign is released under an open source license and bundled in a dependency-free container for wide distribution. It is anticipated that this tool will facilitate multiple applications in genetics and genomics and be useful in the workflow of high-throughput targeted resequencing studies.
  • Item
    The UniProtKB guide to the human proteome
    (Oxford University Press, 2/19/16) Breuza,Lionel; Poux,Sylvain; Estreicher,Anne; Famiglietti,Maria Livia; Magrane,Michele; Tognolli,Michael; Bridge,Alan; Baratin,Delphine; Redaschi,Nicole; UniProt Consortium; Lionel Breuza, Sylvain Poux, Anne Estreicher, Maria Livia Famiglietti, Michele Magrane, Michael Tognolli, Alan Bridge, Delphine Baratin, Nicole Redaschi and The UniProt Consortium; Wu, Cathy Huey-Hwa
    Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB.
  • Item
    miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases
    (Biomed Central Ltd, 4/29/16) Gupta,Samir; Ross,Karen E.; Tudor,Catalina O.; Wu,Cathy H.; Schmidt,Carl J.; Vijay-Shanker,K.; Samir Gupta, Karen E. Ross, Catalina O. Tudor, Cathy H. Wu, Carl J. Schmidt and K. Vijay-Shanker; Wu, Cathy Huey-Hwa;Schmidt, Carl J;Shanker, Vijay K
    Background: MicroRNAs are increasingly being appreciated as critical players in human diseases, and questions concerning the role of microRNAs arise in many areas of biomedical research. There are several manually curated databases of microRNA-disease associations gathered from the biomedical literature; however, it is difficult for curators of these databases to keep up with the explosion of publications in the microRNA-disease field. Moreover, automated literature mining tools that assist manual curation of microRNA-disease associations currently capture only one microRNA property (expression) in the context of one disease (cancer). Thus, there is a clear need to Developmentelop more sophisticated automated literature mining tools that capture a variety of microRNA properties and relations in the context of multiple diseases to provide researchers with fast access to the most recent published information and to streamline and accelerate manual curation. Methods: We have Developmenteloped miRiaD (microRNAs in association with Disease), a text-mining tool that automatically extracts associations between microRNAs and diseases from the literature. These associations are often not directly linked, and the intermediate relations are often highly informative for the biomedical researcher. Thus, miRiaD extracts the miR-disease pairs together with an explanation for their association. We also Developmenteloped a procedure that assigns scores to sentences, marking their informativeness, based on the microRNA-disease relation observed within the sentence. Results: miRiaD was applied to the entire Medline corpus, identifying 8301 PMIDs with miR-disease associations. These abstracts and the miR-disease associations are available for browsing at http://biotm.cis.udel.edu/miRiaD. We evaluated the recall and precision of miRiaD with respect to information of high interest to public microRNA-disease database curators (expression and target gene associations), obtaining a recall of 88.46-90.78. When we expanded the evaluation to include sentences with a wide range of microRNA-disease information that may be of interest to biomedical researchers, miRiaD also performed very well with a F-score of 89.4. The informativeness ranking of sentences was evaluated in terms of nDCG (0.977) and correlation metrics (0.678-0.727) when compared to an annotator's ranked list. Conclusions: miRiaD, a high performance system that can capture a wide variety of microRNA-disease related information, extends beyond the scope of existing microRNA-disease resources. It can be incorporated into manual curation pipelines and serve as a resource for biomedical researchers interested in the role of microRNAs in disease. In our ongoing work we are Developmenteloping an improved miRiaD web interface that will facilitate complex queries about microRNA-disease relationships, such as "In what diseases does microRNA regulation of apoptosis play a role?" or "Is there overlap in the sets of genes targeted by microRNAs in different types of dementia?"."
Copyright: Please look at individual material in order to see what the copyright and licensing terms are. Some material may be available for reuse under a Creative Commons license; other material may be the copyright of the individual author(s) or the publisher of the journal. Copyright lines may not be present in Accepted Manuscript versions so please refer to individual journal policies and/or look up the journal policies in open policy finder.