Open Access Publications

Permanent URI for this collection

Open access publications by faculty, staff, postdocs, and graduate students at the Center for Bioinformatics and Computational Biology.

Browse

Recent Submissions

Now showing 1 - 20 of 29
  • Item
    Sequestration of gene products by decoys enhances precision in the timing of intracellular events
    (Scientific Reports, 2024-11-08) Biswas, Kuheli; Dey, Supravat; Singh, Abhyudai
    Expressed gene products often interact ubiquitously with binding sites at nucleic acids and macromolecular complexes, known as decoys. The binding of transcription factors (TFs) to decoys can be crucial in controlling the stochastic dynamics of gene expression. Here, we explore the impact of decoys on the timing of intracellular events, as captured by the time taken for the levels of a given TF to reach a critical threshold level, known as the first passage time (FPT). Although nonlinearity introduced by binding makes exact mathematical analysis challenging, employing suitable approximations and reformulating FPT in terms of an alternative variable, we analytically assess the impact of decoys. The stability of the decoy-bound TFs against degradation impacts FPT statistics crucially. Decoys reduce noise in FPT, and stable decoy-bound TFs offer greater timing precision with less expression cost than their unstable counterparts. Interestingly, when both bound and free TFs decay at the same rate, decoy binding does not directly alter FPT noise. We verify these results by performing exact stochastic simulations. These results have important implications for the precise temporal scheduling of events involved in the functioning of biomolecular clocks, development processes, cell-cycle control, and cell-size homeostasis.
  • Item
    A long-term high-fat diet induces differential gene expression changes in spatially distinct adipose tissue of male mice
    (Physiological Genomics, 2024-11-11) Alradi, Malak; Askari, Hassan; Shaw, Mark; Bhavsar, Jaysheel D.; Kingham, Brewster F.; Polson, Shawn W.; Fancher, Ibra S.
    The accumulation of visceral adipose tissue (VAT) is strongly associated with cardiovascular disease and diabetes. In contrast, individuals with increased subcutaneous adipose tissue (SAT) without corresponding increases in VAT are associated with a metabolic healthy obese phenotype. These observations implicate dysfunctional VAT as a driver of disease processes, warranting investigation into obesity-induced alterations of distinct adipose depots. To determine the effects of obesity on adipose gene expression, male mice (n = 4) were fed a high-fat diet to induce obesity or a normal laboratory diet (lean controls) for 12–14 mo. Mesenteric VAT and inguinal SAT were isolated for bulk RNA sequencing. AT from lean controls served as a reference to obesity-induced changes. The long-term high-fat diet induced the expression of 169 and 814 unique genes in SAT and VAT, respectively. SAT from obese mice exhibited 308 differentially expressed genes (164 upregulated and 144 downregulated). VAT from obese mice exhibited 690 differentially expressed genes (262 genes upregulated and 428 downregulated). KEGG pathway and GO analyses revealed that metabolic pathways were upregulated in SAT versus downregulated in VAT while inflammatory signaling was upregulated in VAT. We next determined common genes that were differentially regulated between SAT and VAT in response to obesity and identified four genes that exhibited this profile: elovl6 and kcnj15 were upregulated in SAT/downregulated in VAT while trdn and hspb7 were downregulated in SAT/upregulated in VAT. We propose that these genes in particular should be further pursued to determine their roles in SAT versus VAT with respect to obesity. NEW & NOTEWORTHY A long-term high-fat diet induced the expression of more than 980 unique genes across subcutaneous adipose tissue (SAT) and visceral adipose tissue (VAT). The high-fat diet also induced the differential expression of nearly 1,000 AT genes. We identified four genes that were oppositely expressed in SAT versus VAT in response to the high-fat diet and propose that these genes in particular may serve as promising targets aimed at resolving VAT dysfunction in obesity.
  • Item
    A Moments-Based Analytical Approach for Cell Size Homeostasis
    (IEEE Control Systems Letters, 2024-06-07) Nieto, César; Vargas-Garcia, Cesar Augusto; Singh, Abhyudai
    This contribution explores mechanisms that regulate the dynamics of single-cell size, maintaining equilibrium around a target set point. Using the formalism of Stochastic Hybrid Systems (SHS), we consider continuous exponential growth in cell size (as determined by volume/mass/surface area). This continuous-time evolution is interspersed by cell division events that occur randomly as per a given size-dependent rate, and upon division, only one of the two daughter cells is tracked. We show that a size-independent division rate does not provide cell size homeostasis, in the sense that the variance in cell size increases unboundedly over time. Next, we consider a division rate proportional to cell size that yields the adder size control observed in several bacteria – a constant size is added on average between birth and division regardless of the newborn size. For this scenario, we obtain exact formulas for the steady-state moments (mean, variance, and skewness) of cell size. Expanding the SHS model, we explore a biologically relevant scenario where the time between successive division events is further divided into multiple discrete stages with size-dependent stage transitions. Exact moment computations demonstrate that increasing the number of stages reduces cell size variability (noise). We also find formulas considering uneven size partitioning between daughters during division, and where the division rate follows a power law of the cell size leading to deviations from adder size control. This letter provides a method for estimating model parameters from observed cell size distributions and enhances our understanding of mechanisms underlying cell size regulation.
  • Item
    Efficacy of Bacillus subtilis probiotic in preventing necrotic enteritis in broilers: a systematic review and meta-analysis
    (Avian Pathology, 2024-07-03) Ghimire, Shweta; Subedi, Keshab; Zhangb, Xinwen; Wu, Changqing
    Probiotics can enhance broiler chicken health by improving intestinal microbiota, potentially replacing antibiotics. They protect against bacterial diseases like necrotic enteritis (NE) in poultry. Understanding their role is crucial for managing bacterial diseases, including NE. This study conducted a meta-analysis to assess the effects of Bacillus subtilis probiotic supplementation on feed conversion ratio (FCR), NE lesion score, and mortality. Additionally, a systematic review analysed gut microbiota changes in broilers challenged with Clostridium perfringens with or without the probiotic supplementation. Effect sizes from the studies were estimated in terms of standardized mean difference (SMD). Random effect models were fitted to estimate the pooled effect size and 95% confidence interval (CI) of the pooled effect size between the control [probiotic-free + C. perfringens] and the treatment [Bacillus subtilis supplemented + C. perfringens] groups. Overall variance was computed by heterogeneity (Q). The meta-analysis showed that Bacillus subtilis probiotic supplementation significantly improved FCR and reduced NE lesion score but had no effect on mortality rates. The estimated overall effects of probiotic supplementation on FCR, NE lesion score and mortality percentage in terms of SMD were −0.91 (CI = −1.34, −0.49; P < 0.001*); −0.67 (CI = −1.11, −0.22; P = 0.006*), and −0.32 (CI = −0.70, 0.06; P = 0.08), respectively. Heterogeneity analysis indicated significant variations across studies for FCR (Q = 69.66; P < 0.001*) and NE lesion score (Q = 42.35; P < 0.001*) while heterogeneity was not significant for mortality (Q = 2.72; P = 0.74). Bacillus subtilis probiotic supplementation enriched specific gut microbiota including Streptococcus, Butyricicoccus, Faecalibacterium, and Ruminococcus. These microbiotas were found to upregulate expression of various genes such as TJ proteins occluding, ZO-1, junctional adhesion 2 (JAM2), interferon gamma, IL12-β and transforming growth factor-β4. Moreover, downregulated mucin-2 expression was involved in restoring the intestinal physical barrier, reducing intestinal inflammation, and recovering the physiological functions of damaged intestines. These findings highlight the potential benefits of probiotic supplementation in poultry management, particularly in combating bacterial diseases and promoting intestinal health.
  • Item
    Morphology, composition, and deterioration of the embryonic rostral sheath of the smalltooth sawfish (Pristis pectinata)
    (Fishery Bulletin, 2024-05-28) Poulakis, Gregg R.; Wyffels, Jennifer T.; Fortman, P. Eric; Wooley, Andrew K.; Heath, Lukas B.; Yakich, Dylan M.; Wilson, Patrick W.
    Elongated rostra evolved in diverse animal groups as adaptations for feeding, defense, sensory perception, and reproduction. Sawfish rostra have tooth-like dermal denticles, referred to as rostral teeth, along their lateral margins. Embryos have a sheath, or covering, for the calcified rostral teeth during gestation, and it persists until after parturition. Little is known about the morphology and composition of the sheath. During 18 years of tagging juvenile smalltooth sawfish (Pristis pectinata), sheaths were documented for 36 neonates with stretch total lengths of 581–812 mm, and samples were collected from 6 specimens for laboratory evaluation. The multilayered, skin-like sheath, which cannot be easily removed manually, has a vascularized inner layer of connective tissue composed primarily of fibrous proteins (e.g., collagen, reticulin, and keratin) surrounded by an outer layer of columnar and spherical epithelial cells overlying a basement membrane. The columnar cells contain condensed chromatin and differentiate into the outermost spherical cells that contain carbohydrates. After birth, the sheath is shed evenly over 4 d, through sloughing and apoptosis, fully exposing the rostral teeth. The sheath is an ephemeral embryonic organ that protects the female and the embryos from injury during gestation and birth. Resumen Los rostros alargados evolucionaron en diversos grupos de animales como adaptaciones para la alimentación, la defensa, la percepción sensorial y la reproducción. Los rostros de los peces sierra tienen a lo largo de sus márgenes laterales, dentículos dérmicos, denominados dientes rostrales. Los embriones tienen una cubierta o envoltura para los dientes rostrales calcificados durante la gestación, que persiste hasta después del parto. Se sabe poco sobre la morfología y composición de la cubierta. Durante 18 años de marcado de juveniles de pez sierra peine (Pristis pectinata), se documentaron las cubiertas de 36 neonatos con longitudes totales estiradas de 581-812 mm, y se recogieron muestras de 6 especímenes para su evaluación en laboratorio. La cubierta multicapa, similar a la piel, la cual no puede extraerse manualmente con facilidad, tiene una capa interna vascularizada de tejido conectivo compuesta principalmente de proteínas fibrosas (ej. colágeno, reticulina y queratina) rodeada por una capa externa de células epiteliales columnares y esféricas que recubren una membrana basal. Las células columnares contienen cromatina condensada y se diferencian en las células esféricas más externas que contienen carbohidratos. Tras el nacimiento, la cubierta se desprende uniformemente a lo largo de 4 días, mediante descamación y apoptosis, exponiendo completamente los dientes rostrales. La cubierta es un órgano embrionario efímero que protege a la hembra y los embriones de lesiones durante la gestación y el nacimiento.
  • Item
    A short-term, randomized, controlled, feasibility study of the effects of different vegetables on the gut microbiota and microRNA expression in infants
    (Frontiers in Microbiomes, 2024-03-01) Ferro, Lynn E.; Bittinger, Kyle; Trudo, Sabrina P.; Beane, Kaleigh E.; Polson, Shawn W.; Kim, Jae Kyeom; Trabulsi, Jillian C.
    The complementary diet influences the gastrointestinal (gut) microbiota composition and, in turn, host health and, potentially, microRNA (miRNA) expression. This study aimed to assess the feasibility of altering the gut microbial communities with short-term food introduction and to determine the effects of different vegetables on the gut microbiota and miRNA expression in infants. A total of 11 infants were randomized to one of the following intervention arms: control, broccoli, or carrot. The control group maintained the milk diet only, while the other groups consumed either a broccoli puree or a carrot puree on days 1–3 along with their milk diet (human milk or infant formula). Genomic DNA and total RNA were extracted from fecal samples to determine the microbiota composition and miRNA expression. Short-term feeding of both broccoli and carrots resulted in changes in the microbiota and miRNA expression. Compared to the control, a trend toward a decrease in Shannon index was observed in the carrot group on days 2 and 4. The carrot and broccoli groups differed by weighted UniFrac. Streptococcus was increased on day 4 in the carrot group compared to the control. The expression of two miRNAs (i.e., miR-217 and miR-590-5p) trended towards decrease in both the broccoli and carrot groups compared to the control, whereas increases in eight and two different miRNAs were observed in the carrot and broccoli groups, respectively. Vegetable interventions differentially impacted the gut microbiota and miRNA expression, which may be a mechanism by which total vegetable intake and variety are associated with reduced disease risk.
  • Item
    Transcriptional regulation of Sis1 promotes fitness but not feedback in the heat shock response
    (eLife, 2023-05-17) Grade, Rania; Singh, Abhyudai; Ali, Asif; Pincus, David
    The heat shock response (HSR) controls expression of molecular chaperones to maintain protein homeostasis. Previously, we proposed a feedback loop model of the HSR in which heat-denatured proteins sequester the chaperone Hsp70 to activate the HSR, and subsequent induction of Hsp70 deactivates the HSR (Krakowiak et al., 2018; Zheng et al., 2016). However, recent work has implicated newly synthesized proteins (NSPs) – rather than unfolded mature proteins – and the Hsp70 co-chaperone Sis1 in HSR regulation, yet their contributions to HSR dynamics have not been determined. Here, we generate a new mathematical model that incorporates NSPs and Sis1 into the HSR activation mechanism, and we perform genetic decoupling and pulse-labeling experiments to demonstrate that Sis1 induction is dispensable for HSR deactivation. Rather than providing negative feedback to the HSR, transcriptional regulation of Sis1 by Hsf1 promotes fitness by coordinating stress granules and carbon metabolism. These results support an overall model in which NSPs signal the HSR by sequestering Sis1 and Hsp70, while induction of Hsp70 – but not Sis1 – attenuates the response.
  • Item
    Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life
    (Scientific Reports, 2023-02-06) Hallee, Logan; Khomtchouk, Bohdan B.
    In this study, we investigate how an organism’s codon usage bias can serve as a predictor and classifier of various genomic and evolutionary traits across the domains of life. We perform secondary analysis of existing genetic datasets to build several AI/machine learning models. When trained on codon usage patterns of nearly 13,000 organisms, our models accurately predict the organelle of origin and taxonomic identity of nucleotide samples. We extend our analysis to identify the most influential codons for phylogenetic prediction with a custom feature ranking ensemble. Our results suggest that the genetic code can be utilized to train accurate classifiers of taxonomic and phylogenetic features. We then apply this classification framework to open reading frame (ORF) detection. Our statistical model assesses all possible ORFs in a nucleotide sample and rejects or deems them plausible based on the codon usage distribution. Our dataset and analyses are made publicly available on GitHub and the UCI ML Repository to facilitate open-source reproducibility and community engagement.
  • Item
    Transcriptomic Signature of the Simulated Microgravity Response in Caenorhabditis elegans and Comparison to Spaceflight Experiments
    (Cells, 2023-01-10) Çelen, İrem; Jayasinghe, Aroshan; Doh, Jung H.; Sabanayagam, Chandran R.
    Given the growing interest in human exploration of space, it is crucial to identify the effects of space conditions on biological processes. Here, we analyze the transcriptomic response of Caenorhabditis elegans to simulated microgravity and observe the maintained transcriptomic response after returning to ground conditions for four, eight, and twelve days. We show that 75% of the simulated microgravity-induced changes on gene expression persist after returning to ground conditions for four days while most of these changes are reverted after twelve days. Our results from integrative RNA-seq and mass spectrometry analyses suggest that simulated microgravity affects longevity-regulating insulin/IGF-1 and sphingolipid signaling pathways. Finally, we identified 118 genes that are commonly differentially expressed in simulated microgravity- and space-exposed worms. Overall, this work provides insight into the effect of microgravity on biological systems during and after exposure.
  • Item
    DNA Methylation Analysis Reveals Distinct Patterns in Satellite Cell–Derived Myogenic Progenitor Cells of Subjects with Spastic Cerebral Palsy
    (Journal of Personalized Medicine, 2022-11-30) Robinson, Karyn G.; Marsh, Adam G.; Lee, Stephanie K.; Hicks, Jonathan; Romero, Brigette; Batish, Mona; Crowgey, Erin L.; Shrader, M. Wade; Akins, Robert E.
    Spastic type cerebral palsy (CP) is a complex neuromuscular disorder that involves altered skeletal muscle microanatomy and growth, but little is known about the mechanisms contributing to muscle pathophysiology and dysfunction. Traditional genomic approaches have provided limited insight regarding disease onset and severity, but recent epigenomic studies indicate that DNA methylation patterns can be altered in CP. Here, we examined whether a diagnosis of spastic CP is associated with intrinsic DNA methylation differences in myoblasts and myotubes derived from muscle resident stem cell populations (satellite cells; SCs). Twelve subjects were enrolled (6 CP; 6 control) with informed consent/assent. Skeletal muscle biopsies were obtained during orthopedic surgeries, and SCs were isolated and cultured to establish patient–specific myoblast cell lines capable of proliferation and differentiation in culture. DNA methylation analyses indicated significant differences at 525 individual CpG sites in proliferating SC–derived myoblasts (MB) and 1774 CpG sites in differentiating SC–derived myotubes (MT). Of these, 79 CpG sites were common in both culture types. The distribution of differentially methylated 1 Mbp chromosomal segments indicated distinct regional hypo– and hyper–methylation patterns, and significant enrichment of differentially methylated sites on chromosomes 12, 13, 14, 15, 18, and 20. Average methylation load across 2000 bp regions flanking transcriptional start sites was significantly different in 3 genes in MBs, and 10 genes in MTs. SC derived MBs isolated from study participants with spastic CP exhibited fundamental differences in DNA methylation compared to controls at multiple levels of organization that may reveal new targets for studies of mechanisms contributing to muscle dysregulation in spastic CP.
  • Item
    Whole-genome sequencing identifies I-SceI-mediated transgene integration sites in Xenopus tropicalis snai2: eGFP line
    (G3: Genes | Genomes | Genetics, 2022-02-16) Wang, Jian; Lu, Congyu; Wei, Shuo
    Transgenesis with the meganuclease I-SceI is a safe and efficient method, but the underlying mechanisms remain unclear due to the lack of information on transgene localization. Using I-SceI, we previously developed a transgenic Xenopus tropicalis line expressing enhanced green fluorescent protein driven by the neural crest-specific snai2 promoter/enhancer, which is a powerful tool for studying neural crest development and craniofacial morphogenesis. Here we carried out whole-genome shotgun sequencing for the snai2: eGFP embryos to identify the transgene integration sites. With a 19x sequencing coverage, we estimated that 6 copies of the transgene were inserted into the X. tropicalis genome in the hemizygous transgenic embryos. Two transgene integration loci adjacent to each other were identified in a non-coding region on Chromosome 1, possibly as a result of duplication after a single transgene insertion. Interestingly, genomic DNA at the boundaries of the transgene integration loci contains short sequences homologous to the I-SceI recognition site, suggesting that the integration was not random but probably mediated by sequence homology. To our knowledge, our work represents the first genome-wide sequencing study on a transgenic organism generated with I-SceI, which is useful for evaluating the potential genetic effects of I-SceI-mediated transgenesis and further understanding the mechanisms underlying this transgenic method.
  • Item
    Protein Ontology (PRO): enhancing and scaling up the representation of protein entities
    (Oxford University Press, 2016-11-28) Natale, Darren A.; Arighi, Cecilia N.; Blake, Judith A.; Bona, Jonathan; Chen, Chuming; Chen, Sheng-Chih; Christie, Karen R.; Cowart, Julie; D’Eustachio, Peter; Diehl, Alexander D.; Drabkin, Harold J.; Duncan, William D.; Huang, Hongzhan; Ren, Jia; Ross, Karen; Ruttenberg, Alan; Shamovsky, Veronica; Smith, Barry; Wang, Qinghua; Zhang, Jian; El-Sayed, Abdelrahman; Wu, Cathy H.; Darren A. Natale, Cecilia N. Arighi, Judith A. Blake, Jonathan Bona, Chuming Chen, Sheng-Chih Chen, Karen R. Christie, Julie Cowart, Peter D’Eustachio, Alexander D. Diehl, Harold J. Drabkin, William D. Duncan, Hongzhan Huang, Jia Ren, Karen Ross, Alan Ruttenberg, Veronica Shamovsky, Barry Smith, Qinghua Wang, Jian Zhang, Abdelrahman El-Sayed and Cathy H. Wu; Arighi, Cecilia N.; Chen, Chuming; Chen, Sheng-Chih; Cowart, Julie; Huang, Hongzhan; Ren, Jia; Wang, Qinghua; Wu, Cathy H.
    The Protein Ontology (PRO; http://purl.obolibrary. org/obo/pr) formally defines and describes taxonspecific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and proteincontaining complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translationalmodification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities.
  • Item
    ThermoAlign: a genome-aware primer design tool for tiled amplicon resequencing
    (Nature Publishing Group, 2017-03-16) Francis, Felix; Dumas, Michael D.; Wisser, Randall J.; Felix Francis, Michael D. Dumas & Randall J. Wisser; Francis, Felix; Dumas, Michael D.; Wisser, Randall J.
    Isolating and sequencing specific regions in a genome is a cornerstone of molecular biology. This has been facilitated by computationally encoding the thermodynamics of DNA hybridization for automated design of hybridization and priming oligonucleotides. However, the repetitive composition of genomes challenges the identification of target-specific oligonucleotides, which limits genetics and genomics research on many species. Here, a tool called ThermoAlign was developed that ensures the design of target-specific primer pairs for DNA amplification. This is achieved by evaluating the thermodynamics of hybridization for full-length oligonucleotide-template alignments — thermoalignments — across the genome to identify primers predicted to bind specifically to the target site. For amplificationbased resequencing of regions that cannot be amplified by a single primer pair, a directed graph analysis method is used to identify minimum amplicon tiling paths. Laboratory validation by standard and long-range polymerase chain reaction and amplicon resequencing with maize, one of the most repetitive genomes sequenced to date (≈85% repeat content), demonstrated the specificity-by-design functionality of ThermoAlign. ThermoAlign is released under an open source license and bundled in a dependency-free container for wide distribution. It is anticipated that this tool will facilitate multiple applications in genetics and genomics and be useful in the workflow of high-throughput targeted resequencing studies.
  • Item
    ThermoAlign: a genome-aware primer design tool for tiled amplicon resequencing
    (Nature Publishing Group, 2017-03-16) Francis, Felix; Dumas, Michael D.; Wisser, RJ; Felix Francis, Michael D. Dumas and Randall J.Wisser; Wisser, Randall Jerome
    Isolating and sequencing specific regions in a genome is a cornerstone of molecular biology. This has been facilitated by computationally encoding the thermodynamics of DNA hybridization for automated design of hybridization and priming oligonucleotides. However, the repetitive composition of genomes challenges the identification of target-specific oligonucleotides, which limits genetics and genomics research on many species. Here, a tool called ThermoAlign was developed that ensures the design of target-specific primer pairs for DNA amplification. This is achieved by evaluating the thermodynamics of hybridization for full-length oligonucleotide-template alignments — thermoalignments — across the genome to identify primers predicted to bind specifically to the target site. For amplificationbased resequencing of regions that cannot be amplified by a single primer pair, a directed graph analysis method is used to identify minimum amplicon tiling paths. Laboratory validation by standard and long-range polymerase chain reaction and amplicon resequencing with maize, one of the most repetitive genomes sequenced to date (≈85% repeat content), demonstrated the specificity-by-design functionality of ThermoAlign. ThermoAlign is released under an open source license and bundled in a dependency-free container for wide distribution. It is anticipated that this tool will facilitate multiple applications in genetics and genomics and be useful in the workflow of high-throughput targeted resequencing studies.
  • Item
    The UniProtKB guide to the human proteome
    (Oxford University Press, 2/19/16) Breuza,Lionel; Poux,Sylvain; Estreicher,Anne; Famiglietti,Maria Livia; Magrane,Michele; Tognolli,Michael; Bridge,Alan; Baratin,Delphine; Redaschi,Nicole; UniProt Consortium; Lionel Breuza, Sylvain Poux, Anne Estreicher, Maria Livia Famiglietti, Michele Magrane, Michael Tognolli, Alan Bridge, Delphine Baratin, Nicole Redaschi and The UniProt Consortium; Wu, Cathy Huey-Hwa
    Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB.
  • Item
    miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases
    (Biomed Central Ltd, 4/29/16) Gupta,Samir; Ross,Karen E.; Tudor,Catalina O.; Wu,Cathy H.; Schmidt,Carl J.; Vijay-Shanker,K.; Samir Gupta, Karen E. Ross, Catalina O. Tudor, Cathy H. Wu, Carl J. Schmidt and K. Vijay-Shanker; Wu, Cathy Huey-Hwa;Schmidt, Carl J;Shanker, Vijay K
    Background: MicroRNAs are increasingly being appreciated as critical players in human diseases, and questions concerning the role of microRNAs arise in many areas of biomedical research. There are several manually curated databases of microRNA-disease associations gathered from the biomedical literature; however, it is difficult for curators of these databases to keep up with the explosion of publications in the microRNA-disease field. Moreover, automated literature mining tools that assist manual curation of microRNA-disease associations currently capture only one microRNA property (expression) in the context of one disease (cancer). Thus, there is a clear need to Developmentelop more sophisticated automated literature mining tools that capture a variety of microRNA properties and relations in the context of multiple diseases to provide researchers with fast access to the most recent published information and to streamline and accelerate manual curation. Methods: We have Developmenteloped miRiaD (microRNAs in association with Disease), a text-mining tool that automatically extracts associations between microRNAs and diseases from the literature. These associations are often not directly linked, and the intermediate relations are often highly informative for the biomedical researcher. Thus, miRiaD extracts the miR-disease pairs together with an explanation for their association. We also Developmenteloped a procedure that assigns scores to sentences, marking their informativeness, based on the microRNA-disease relation observed within the sentence. Results: miRiaD was applied to the entire Medline corpus, identifying 8301 PMIDs with miR-disease associations. These abstracts and the miR-disease associations are available for browsing at http://biotm.cis.udel.edu/miRiaD. We evaluated the recall and precision of miRiaD with respect to information of high interest to public microRNA-disease database curators (expression and target gene associations), obtaining a recall of 88.46-90.78. When we expanded the evaluation to include sentences with a wide range of microRNA-disease information that may be of interest to biomedical researchers, miRiaD also performed very well with a F-score of 89.4. The informativeness ranking of sentences was evaluated in terms of nDCG (0.977) and correlation metrics (0.678-0.727) when compared to an annotator's ranked list. Conclusions: miRiaD, a high performance system that can capture a wide variety of microRNA-disease related information, extends beyond the scope of existing microRNA-disease resources. It can be incorporated into manual curation pipelines and serve as a resource for biomedical researchers interested in the role of microRNAs in disease. In our ongoing work we are Developmenteloping an improved miRiaD web interface that will facilitate complex queries about microRNA-disease relationships, such as "In what diseases does microRNA regulation of apoptosis play a role?" or "Is there overlap in the sets of genes targeted by microRNAs in different types of dementia?"."
  • Item
    Intercellular Variability in Protein Levels from Stochastic Expression and Noisy Cell Cycle Processes
    (Public Library Science, 2016-08-18) Soltani,Mohammad; Vargas-Garcia,Cesar A.; Antunes,Duarte; Singh,Abhyudai; Mohammad Soltani, Cesar A. Vargas-Garcia, Duarte Antunes, Abhyudai Singh; Singh, Abhyudai
    Inside individual cells, expression of genes is inherently stochastic and manifests as cell-to-cell variability or noise in protein copy numbers. Since proteins half-lives can be comparable to the cell-cycle length, randomness in cell-division times generates additional intercellular variability in protein levels. Moreover, as many mRNA/protein species are expressed at low-copy numbers, errors incurred in partitioning of molecules between two daughter cells are significant. We derive analytical formulas for the total noise in protein levels when the cell-cycle duration follows a general class of probability distributions. Using a novel hybrid approach the total noise is decomposed into components arising from i) stochastic expression; ii) partitioning errors at the time of cell division and iii) random cell-division events. These formulas reveal that random cell-division times not only generate additional extrinsic noise, but also critically affect the mean protein copy numbers and intrinsic noise components. Counter intuitively, in some parameter regimes, noise in protein levels can decrease as cell-division times become more stochastic. Computations are extended to consider genome duplication, where transcription rate is increased at a random point in the cell cycle. We systematically investigate how the timing of genome duplication influences different protein noise components. Intriguingly, results show that noise contribution from stochastic expression is minimized at an optimal genome-duplication time. Our theoretical results motivate new experimental methods for decomposing protein noise levels from synchronized and asynchronized single-cell expression data. Characterizing the contributions of individual noise mechanisms will lead to precise estimates of gene expression parameters and techniques for altering stochasticity to change phenotype of individual cells.
  • Item
    BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID
    (Oxford University Press, 8/2/16) Kim,Sun; Dogan,Rezarta Islamaj; Chatr-Aryamontri,Andrew; Chang,Christie S.; Oughtred,Rose; Rust,Jennifer; Batista-Navarro,Riza; Carter,Jacob; Ananiadou,Sophia; Matos,Sergio; Santos,Andre; Campos,David; Oliveira,Jose Luis; Singh,Onkar; Jonnagaddala,Jitendra; Dai,Hong-Jie; Su,Emily Chia-Yu; Chang,Yung-Chun; Su,Yu-Chen; Chu,Chun-Han; Chen,Chien Chin; Hsu,Wen-Lian; Peng,Yifan; Arighi,Cecilia; Wu,Cathy H.; Vijay-Shanker,K.; Aydin,Ferhat; Husunbeyi,Zehra Melce; Ozgur,Arzucan; Shin,Soo-Yong; Kwon,Dongseop; Dolinski,Kara; Tyers,Mike; Wilbur,W. John; Comeau,Donald C.; Sun Kim, Rezarta Islamaj Do gan, Andrew Chatr-Aryamontri, Christie S. Chang, Rose Oughtred, Jennifer Rust, Riza Batista-Navarro, Jacob Carter, Sophia Ananiadou, Se� rgio Matos, Andre� Santos, David Campos, Jose�Lu?s Oliveira, Onkar Singh, Jitendra Jonnagaddala, Hong-Jie Dai, Emily Chia-Yu Su, Yung-Chun Chang, Yu-Chen Su, Chun-Han Chu, Chien Chin Chen,Wen-Lian Hsu,Yifan Peng, Cecilia Arighi,Cathy H. Wu, K. Vijay-Shanker, Ferhat Ayd?n, Zehra Melce Husunbey, Arzucan Ozgu, Soo-Yong Shin, Dongseop Kwon, Kara Dolinski, Mike Tyers, W. John Wilbur and Donald C. Comeau; Arighi, Cecilia Noemi; Wu, Cathy Huey-Hwa; Shanker, Vijay K
    BioC is a simple XML format for text, annotations and relations, and was Developmenteloped to achieve interoperability for biomedical text processing. Following the success of BioC in BioCreative IV, the BioCreative V BioC track addressed a collaborative task to build an assistant system for BioGRID curation. In this paper, we describe the framework of the collaborative BioC task and discuss our findings based on the user survey. This track consisted of eight subtasks including gene/protein/organism named entity recognition, protein-protein/genetic interaction passage identification and annotation visualization. Using BioC as their data-sharing and communication medium, nine teams, world-wide, participated and contributed either new methods or improvements of existing tools to address different subtasks of the BioC track. Results from different teams were shared in BioC and made available to other teams as they addressed different subtasks of the track. In the end, all submitted runs were merged using a machine learning classifier to produce an optimized output. The biocurator assistant system was evaluated by four BioGRID curators in terms of practical usability. The curators' feedback was overall positive and highlighted the user-friendly design and the convenient gene/protein curation tool based on text mining.
  • Item
    BioC-compatible full-text passage detection for protein-protein interactions using extended dependency graph
    (Oxford University Press, 4/12/16) Peng,Yifan; Arighi,Cecilia; Wu,Cathy H.; Vijay-Shanker,K.; Yifan Peng, Cecilia Arighi, Cathy H. Wu and K. Vijay-Shanker; Arighi, Cecilia Noemi; Wu, Cathy Huey-Hwa; Shanker, Vijay K
    There has been a large growth in the number of biomedical publications that report experimental results. Many of these results concern detection of protein-protein interactions (PPI). In BioCreative V, we participated in the BioC task and Developmenteloped a PPI system to detect text passages with PPIs in the full-text articles. By adopting the BioC format, the output of the system can be seamlessly added to the biocuration pipeline with little effort required for the system integration. A distinctive feature of our PPI system is that it utilizes extended dependency graph, an intermediate level of representation that attempts to abstract away syntactic variations in text. As a result, we are able to use only a limited set of rules to extract PPI pairs in the sentences, and additional rules to detect additional passages for PPI pairs. For evaluation, we used the 95 articles that were provided for the BioC annotation task. We retrieved the unique PPIs from the BioGRID database for these articles and show that our system achieves a recall of 83.5%. In order to evaluate the detection of passages with PPIs, we further annotated Abstract and Results sections of 20 documents from the dataset and show that an f-value of 80.5% was obtained. To evaluate the generalizability of the system, we also conducted experiments on AIMed, a well-known PPI corpus. We achieved an f-value of 76.1% for sentence detection and an f-value of 64.7% for unique PPI detection.
  • Item
    InterPro in 2017––beyond protein family and domain annotations
    (Oxford University Press, 2016-11-28) Finn, Robert D.; Attwood, Teresa K.; Babbitt, Patricia C.; Bateman, Alex; Bork, Peer; Bridge, Alan J.; Chang, Hsin-Yu; Doszt´anyi, Zsuzsanna; El-Gebali, Sara; Fraser, Matthew; Gough, Julian; Haft, David; Holliday, Gemma L.; Huang, Hongzhan; Huang, Xiaosong; Letunic, Ivica; Lopez, Rodrigo; Lu, Shennan; Marchler-Bauer, Aron; Mi, Huaiyu; Mistry, Jaina; Natale, Darren A.; Necci, Marco; Nuka, Gift; Orengo, Christine A.; Park, Youngmi; Pesseat, Sebastien; Piovesan, Damiano; Potter, Simon C.; Rawlings, Neil D.; Redaschi, Nicole; Richardson, Lorna; Rivoire, Catherine; Sangrador-Vegas, Amaia; Sigrist, Christian; Sillitoe, Ian; Smithers, Ben; Squizzato, Silvano; Sutton, Granger; Thanki, Narmada; Thomas, Paul D.; Tosatto, Silvio C. E.; Wu, Cathy H.; Xenarios, Ioannis; Yeh, Lai-Su; Young, Siew-Yit; Mitchell, Alex L.; Robert D. Finn, Teresa K. Attwood, Patricia C. Babbitt, Alex Bateman, Peer Bork, Alan J. Bridge, Hsin-Yu Chang, Zsuzsanna Doszt´anyi, Sara El-Gebali, Matthew Fraser, Julian Gough, David Haft, Gemma L. Holliday, Hongzhan Huang, Xiaosong Huang, Ivica Letunic, Rodrigo Lopez, Shennan Lu, Aron Marchler-Bauer, Huaiyu Mi, Jaina Mistry, Darren A Natale, Marco Necci, Gift Nuka, Christine A. Orengo, Youngmi Park, Sebastien Pesseat, Damiano Piovesan, Simon C. Potter, Neil D. Rawlings, Nicole Redaschi, Lorna Richardson, Catherine Rivoire, Amaia Sangrador-Vegas, Christian Sigrist, Ian Sillitoe, Ben Smithers, Silvano Squizzato, Granger Sutton, Narmada Thanki, Paul D Thomas, Silvio C. E. Tosatto, Cathy H.Wu, Ioannis Xenarios, Lai-Su Yeh, Siew-Yit Young and Alex L. Mitchell; Wu, Cathy H.; Huang, Hongzhan
    InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against Inter- Pro’s predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.
Copyright: Please look at individual material in order to see what the copyright and licensing terms are. Some material may be available for reuse under a Creative Commons license; other material may be the copyright of the individual author(s) or the publisher of the journal. Copyright lines may not be present in Accepted Manuscript versions so please refer to individual journal policies and/or look up the journal policies in Sherpa Romeo.