Browsing by Author "Wu, Cathy H."
Now showing 1 - 20 of 22
Results Per Page
Sort Options
Item Association of Area Deprivation With Primary Hypertension Diagnosis Among Youth Medicaid Recipients in Delaware(Jama Network Open, 2023-03-15) Baker-Smith, Carissa M.; Yang, Wei; McDuffie, Mary J.; Nescott, Erin P.; Wolf, Bethany J.; Wu, Cathy H.; Zhang, Zugui; Akins, Robert E.Key Points: Question: Is there an association between neighborhood measures of deprivation and hypertension diagnosis in youth? Findings: In this cross-sectional study of 65 452 Delaware Medicaid-insured youths aged 8 to 18 years between 2014 and 2019, residence in neighborhoods with a higher area deprivation index was associated with primary hypertension diagnosis. Meaning: These findings suggest that there is an association between greater neighborhood deprivation and a diagnosis of primary hypertension in youths, which may be an important factor to consider in assessing the presence and prevalence of hypertension in youths. Abstract: Importance: The association between degree of neighborhood deprivation and primary hypertension diagnosis in youth remains understudied. Objective: To assess the association between neighborhood measures of deprivation and primary hypertension diagnosis in youth. Design, Setting, and Participants: This cross-sectional study included 65 452 Delaware Medicaid-insured youths aged 8 to 18 years between January 1, 2014, and December 31, 2019. Residence was geocoded by national area deprivation index (ADI). Exposures: Higher area deprivation. Main Outcomes and Measures; The main outcome was primary hypertension diagnosis based on International Classification of Diseases, Ninth Revision and Tenth Revision codes. Data were analyzed between September 1, 2021, and December 31, 2022. Results: A total of 65 452 youths were included in the analysis, including 64 307 (98.3%) without a hypertension diagnosis (30 491 [47%] female and 33 813 [53%] male; mean [SD] age, 12.5 (3.1) years; 12 500 [19%] Hispanic, 25 473 [40%] non-Hispanic Black, 24 565 [38%] non-Hispanic White, and 1769 [3%] other race or ethnicity; 13 029 [20%] with obesity; and 31 548 [49%] with an ADI ≥50) and 1145 (1.7%) with a diagnosis of primary hypertension (mean [SD] age, 13.3 [2.8] years; 464 [41%] female and 681 [59%] male; 271 [24%] Hispanic, 460 [40%] non-Hispanic Black, 396 [35%] non-Hispanic White, and 18 [2%] of other race or ethnicity; 705 [62%] with obesity; and 614 [54%] with an ADI ≥50). The mean (SD) duration of full Medicaid benefit coverage was 61 (16) months for those with a diagnosis of primary hypertension and 46.0 (24.3) months for those without. By multivariable logistic regression, residence within communities with ADI greater than or equal to 50 was associated with 60% greater odds of a hypertension diagnosis (odds ratio [OR], 1.61; 95% CI 1.04-2.51). Older age (OR per year, 1.16; 95%, CI, 1.14-1.18), an obesity diagnosis (OR, 5.16; 95% CI, 4.54-5.85), and longer duration of full Medicaid benefit coverage (OR, 1.03; 95% CI, 1.03-1.04) were associated with greater odds of primary hypertension diagnosis, whereas female sex was associated with lower odds (OR, 0.68; 95%, 0.61-0.77). Model fit including a Medicaid-by-ADI interaction term was significant for the interaction and revealed slightly greater odds of hypertension diagnosis for youths with ADI less than 50 (OR, 1.03; 95% CI, 1.03-1.04) vs ADI ≥50 (OR, 1.02; 95% CI, 1.02-1.03). Race and ethnicity were not associated with primary hypertension diagnosis. Conclusions and Relevance; In this cross-sectional study, higher childhood neighborhood ADI, obesity, age, sex, and duration of Medicaid benefit coverage were associated with a primary hypertension diagnosis in youth. Screening algorithms and national guidelines may consider the importance of ADI when assessing for the presence and prevalence of primary hypertension in youth.Item Bioinformatics Knowledge Map for Analysis of Beta-Catenin Function in Cancer(Public Library of Science, 2015-10-28) Çelen, İrem; Ross, Karen E.; Arighi, Cecilia N.; Wu, Cathy H.; İrem Çelen, Karen E. Ross, Cecilia N. Arighi, Cathy H. Wu; Çelen, Irem; Ross, Karen E.; Arighi, Cecilia N.; Wu, Cathy H.Given the wealth of bioinformatics resources and the growing complexity of biological information, it is valuable to integrate data from disparate sources to gain insight into the role of genes/proteins in health and disease. We have developed a bioinformatics framework that combines literature mining with information from biomedical ontologies and curated databases to create knowledge “maps” of genes/proteins of interest.We applied this approach to the study of beta-catenin, a cell adhesion molecule and transcriptional regulator implicated in cancer. The knowledge map includes post-translational modifications (PTMs), protein- protein interactions, disease-associated mutations, and transcription factors coactivated by beta-catenin and their targets and captures the major processes in which betacatenin is known to participate. Using the map, we generated testable hypotheses about beta-catenin biology in normal and cancer cells. By focusing on proteins participating in multiple relation types, we identified proteins that may participate in feedback loops regulating beta-catenin transcriptional activity. By combining multiple network relations with PTM proteoform- specific functional information, we proposed a mechanism to explain the observation that the cyclin dependent kinase CDK5 positively regulates beta-catenin co-activator activity. Finally, by overlaying cancer-associated mutation data with sequence features, we observed mutation patterns in several beta-catenin PTM sites and PTM enzyme binding sites that varied by tissue type, suggesting multiple mechanisms by which beta-catenin mutations can contribute to cancer. The approach described, which captures rich information for molecular species from genes and proteins to PTM proteoforms, is extensible to other proteins and their involvement in disease.Item COVID-19 Knowledge Graph from semantic integration of biomedical literature and databases(Bioinformatics, 2021-10-06) Chen, Chuming; Ross, Karen E.; Gavali, Sachin; Cowart, Julie E.; Wu, Cathy H.The global response to the COVID-19 pandemic has led to a rapid increase of scientific literature on this deadly disease. Extracting knowledge from biomedical literature and integrating it with relevant information from curated biological databases is essential to gain insight into COVID-19 etiology, diagnosis and treatment. We used Semantic Web technology RDF to integrate COVID-19 knowledge mined from literature by iTextMine, PubTator and SemRep with relevant biological databases and formalized the knowledge in a standardized and computable COVID-19 Knowledge Graph (KG). We published the COVID-19 KG via a SPARQL endpoint to support federated queries on the Semantic Web and developed a knowledge portal with browsing and searching interfaces. We also developed a RESTful API to support programmatic access and provided RDF dumps for download.Item A crowdsourcing open platform for literature curation in UniProt(PLOS Biology, 2021-12-06) Wang, Yuqi; Wang, Qinghua; Huang, Hongzhan; Huang, Wei; Chen, Yongxing; McGarvey, Peter B.; Wu, Cathy H.; Arighi, Cecilia N.The UniProt knowledgebase is a public database for protein sequence and function, covering the tree of life and over 220 million protein entries. Now, the whole community can use a new crowdsourcing annotation system to help scale up UniProt curation and receive proper attribution for their biocuration work.Item DARWIN - A Resource for Computational and Data-intensive Research at the University of Delaware and in the Delaware Region(Data Science Institute [DSI], University of Delaware, Newark, DE, 2021) Eigenmann, Rudolf; Bagozzi, Benjamin E.; Jayaraman, Arthi; Totten, William; Wu, Cathy H.Item InterPro in 2017––beyond protein family and domain annotations(Oxford University Press, 2016-11-28) Finn, Robert D.; Attwood, Teresa K.; Babbitt, Patricia C.; Bateman, Alex; Bork, Peer; Bridge, Alan J.; Chang, Hsin-Yu; Doszt´anyi, Zsuzsanna; El-Gebali, Sara; Fraser, Matthew; Gough, Julian; Haft, David; Holliday, Gemma L.; Huang, Hongzhan; Huang, Xiaosong; Letunic, Ivica; Lopez, Rodrigo; Lu, Shennan; Marchler-Bauer, Aron; Mi, Huaiyu; Mistry, Jaina; Natale, Darren A.; Necci, Marco; Nuka, Gift; Orengo, Christine A.; Park, Youngmi; Pesseat, Sebastien; Piovesan, Damiano; Potter, Simon C.; Rawlings, Neil D.; Redaschi, Nicole; Richardson, Lorna; Rivoire, Catherine; Sangrador-Vegas, Amaia; Sigrist, Christian; Sillitoe, Ian; Smithers, Ben; Squizzato, Silvano; Sutton, Granger; Thanki, Narmada; Thomas, Paul D.; Tosatto, Silvio C. E.; Wu, Cathy H.; Xenarios, Ioannis; Yeh, Lai-Su; Young, Siew-Yit; Mitchell, Alex L.; Robert D. Finn, Teresa K. Attwood, Patricia C. Babbitt, Alex Bateman, Peer Bork, Alan J. Bridge, Hsin-Yu Chang, Zsuzsanna Doszt´anyi, Sara El-Gebali, Matthew Fraser, Julian Gough, David Haft, Gemma L. Holliday, Hongzhan Huang, Xiaosong Huang, Ivica Letunic, Rodrigo Lopez, Shennan Lu, Aron Marchler-Bauer, Huaiyu Mi, Jaina Mistry, Darren A Natale, Marco Necci, Gift Nuka, Christine A. Orengo, Youngmi Park, Sebastien Pesseat, Damiano Piovesan, Simon C. Potter, Neil D. Rawlings, Nicole Redaschi, Lorna Richardson, Catherine Rivoire, Amaia Sangrador-Vegas, Christian Sigrist, Ian Sillitoe, Ben Smithers, Silvano Squizzato, Granger Sutton, Narmada Thanki, Paul D Thomas, Silvio C. E. Tosatto, Cathy H.Wu, Ioannis Xenarios, Lai-Su Yeh, Siew-Yit Young and Alex L. Mitchell; Wu, Cathy H.; Huang, HongzhanInterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against Inter- Pro’s predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.Item miRTex: A Text Mining System for miRNAGene Relation Extraction(PLOS (Public Library of Science), 2015-09-25) Li, Gang; Ross, Karen E.; Arighi, Cecilia N.; Peng, Yifan; Wu, Cathy H.; Vijay-Shanker, K.; Gang Li, Karen E. Ross, Cecilia N. Arighi, Yifan Peng, Cathy H. Wu, K. Vijay-Shanker; Li, Gang; Ross, Karen E.; Arighi, Cecilia N.; Peng, Yifan; Wu, Cathy H.; Vijay-Shanker, K.MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes.Item pGenN, a Gene Normalization Tool for Plant Genes and Proteins in Scientific Literature(PLOS (Public Library of Science), 2015-08-10) Ding, Ruoyao; Arighi, Cecilia N.; Lee, Jung-Youn; Wu, Cathy H.; Vijay-Shanker, K.; Ruoyao Ding, Cecilia N. Arighi, Jung-Youn Lee, Cathy H. Wu, K. Vijay-Shanker; Ding, Ruoyao; Arighi, Cecilia N.; Lee, Jung-Youn; Wu, Cathy H.; Vijay-Shanker, K.BACKGROUND Automatically detecting gene/protein names in the literature and connecting them to databases records, also known as gene normalization, provides a means to structure the information buried in free-text literature. Gene normalization is critical for improving the coverage of annotation in the databases, and is an essential component of many text mining systems and database curation pipelines. METHODS In this manuscript, we describe a gene normalization system specifically tailored for plant species, called pGenN (pivot-based Gene Normalization). The system consists of three steps: dictionary-based gene mention detection, species assignment, and intra species normalization. We have developed new heuristics to improve each of these phases. RESULTS We evaluated the performance of pGenN on an in-house expertly annotated corpus consisting of 104 plant relevant abstracts. Our system achieved an F-value of 88.9%(Precision 90.9% and Recall 87.2%) on this corpus, outperforming state-of-art systems presented in BioCreative III. We have processed over 440,000 plant-related Medline abstracts using pGenN. The gene normalization results are stored in a local database for direct query from the pGenN web interface (proteininformationresource.org/pgenn/). The annotated literature corpus is also publiclyItem pGenN, a Gene Normalization Tool for Plant Genes and Proteins in Scientific Literature(Public Library of Science, 2015-08-10) Ding, Ruoyao; Arighi, Cecilia N.; Lee, Jung-Youn; Wu, Cathy H.; Vijay-Shanker, K.; Ruoyao Ding, Cecilia N. Arighi, Jung-Youn Lee, Cathy H. Wu, K. Vijay-Shanker; Dina, Ruoyao; Arighi, Cecilia N.; Lee, Jung-Youn; Wu, Cathy H.; Vijay-Shanker, K.BACKGROUND Automatically detecting gene/protein names in the literature and connecting them to databases records, also known as gene normalization, provides a means to structure the information buried in free-text literature. Gene normalization is critical for improving the coverage of annotation in the databases, and is an essential component of many text mining systems and database curation pipelines. METHODS In this manuscript, we describe a gene normalization system specifically tailored for plant species, called pGenN (pivot-based Gene Normalization). The system consists of three steps: dictionary-based gene mention detection, species assignment, and intra species normalization. We have developed new heuristics to improve each of these phases. RESULTS We evaluated the performance of pGenN on an in-house expertly annotated corpus consisting of 104 plant relevant abstracts. Our system achieved an F-value of 88.9%(Precision 90.9% and Recall 87.2%) on this corpus, outperforming state-of-art systems presented in BioCreative III. We have processed over 440,000 plant-related Medline abstracts using pGenN. The gene normalization results are stored in a local database for direct query from the pGenN web interface (proteininformationresource.org/pgenn/). The annotated literature corpus is also publicly available through the PIR text mining portal (proteininformationresource. org/iprolink/).Item Predicting nsSNPs that disrupt protein-protein interactions using docking(IEEE Computational Intelligence Society ; IEEE Computer Society ; IEEE Control Systems Society ; IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery, 2016-01-22) Goodacre, Norman; Edwards, Nathan; Danielsen, Mark; Uetz, Peter; Wu, Cathy H.; Norman Goodacre, Nathan Edwards, Mark Danielsen, Peter Uetz, Cathy Wu; Wu, Cathy H.The human genome contains a large number of protein polymorphisms due to individual genome variation. How many of these polymorphisms lead to altered protein-protein interaction is unknown. We have developed a method to address this question. The intersection of the SKEMPI database (of affinity constants among interacting proteins) and CAPRI 4.0 docking benchmark was docked using HADDOCK, leading to a training set of 166 mutant pairs. A random forest classifier that uses the differences in resulting docking scores between the 166 mutant pairs and their wild-types was used, to distinguish between variants that have either completely or partially lost binding ability. 50% of non-binders were correctly predicted with a false discovery rate of only 2%. The model was tested on a set of 15 HIV-1 - human, as well as 7 human - human glioblastoma-related, mutant proteins pairs: 50% of combined non-binders were correctly predicted with a false discovery rate of 10%. The model was also used to identify 10 protein-protein interactions between human proteins and their HIV-1 partners that are likely to be abolished by rare non-synonymous single-nucleotide polymorphisms (nsSNPs). These nsSNPs may represent novel and potentially therapeutically-valuable targets for anti-viral therapy by disruption of viral binding.Item Proceedings of the 2020 DARWIN Computing Symposium(Data Science Institute of the University of Delaware, 2020-02-12) Jayraman, Arthi; Bagozzi, Benjamin E.; Eigenmann, Rudolf; Totten, William; Wu, Cathy H.The DARWIN Computing Symposium 2020—sponsored by the Data Science Institute of the University of Delaware—was held on February 12th, 2020. It represented the first event in a series of Symposia motivated by a National Science Foundation (NSF) MRI Award, also known as the Delaware Advanced Research Workforce and Innovation Network (DARWIN). As part of an NSF Major Research Instrumentation award (OAC-1919839), DARWIN has the goal of catalyzing "research and education at the University of Delaware (UD) and partners by acquiring a big data and high-performance computing system and making this instrument available to the community." This first DARWIN Computing Symposium introduced the machine—being ordered at the time— to the community, showcased computational and data-enabled research that will take advantage of the instrument, and provided opportunities for forming collaborations among future users at UD and regional partners. The Symposium included presentations detailing research involving UD faculty and members of DARWIN partner institutions that spanned the chemical and material sciences, engineering, the biological sciences, the environmental sciences, business, the social sciences, and education. Alongside this, an informational session on the DARWIN machine, a panel, and a student poster session provided an equally diverse range of additional viewpoints on computational- and data-intensive research, training, and education across the Delaware region and beyond. In addition to the NSF and Data Science Institute, the 2020 DARWIN Computing Symposium was sponsored by Atipa Technologies, DELL and CompassRed. Dr. Arthi Jayaraman, UD Professor and DARWIN Co-PI, served as chair of the 2020 DARWIN Computing Symposium.Item Proceedings of the 2021 DARWIN Computing Symposium(Data Science Institute of the University of Delaware, 2021-02-12) Bagozzi, Benjamin E.; Eigenmann, Rudolf; Jayaraman, Arthi; Totten, William; Wu, Cathy H.The DARWIN Computing Symposium 2021—sponsored by the Data Science Institute of the University of Delaware—was held on February 12, 2021. It represented the second event in a series of Symposia motivated by a National Science Foundation (NSF) MRI Award, also known as the Delaware Advanced Research Workforce and Innovation Network (DARWIN). As part of an NSF Major Research Instrumentation award (OAC-1919839), DARWIN has the goal of catalyzing "research and education at the University of Delaware (UD) and partners by acquiring a big data and high-performance computing system and making this instrument available to the community." This particular Symposium showcased recent computational and data-enabled research across the Delaware region, offered perspectives on broadening participation in computational and data-intensive research, and facilitated opportunities for forming collaborations among future DARWIN users at UD and regional partners. It also provided an overview of the newly operational DARWIN big data and high-performance computing machine via a panel on “early user mode” experiences. The 2021 DARWIN Computing Symposium was supported by the NSF and UD’s Data Science Institute. Dr. Benjamin E. Bagozzi, UD Associate Professor and DARWIN Co-PI, served as chair of the 2021 Symposium.Item Proceedings of the 2022 DARWIN Computing Symposium(Data Science Institute of the University of Delaware, 2022-03-24) Hadden-Perilla, Jodi A.; Perilla, Juan R.; Bagozzi, Benjamin E.; Eigenmann, Rudolf; Jayaraman, Arthi; Totten, William; Wu, Cathy H.The DARWIN Computing Symposium 2022—sponsored by the Data Science Institute of the University of Delaware—was held on March 24, 2022. It represented the third event in a series of Symposia motivated by a National Science Foundation (NSF) MRI Award, also known as the Delaware Advanced Research Workforce and Innovation Network (DARWIN). As part of an NSF Major Research Instrumentation award (OAC-1919839), DARWIN has the goal of catalyzing "research and education at the University of Delaware (UD) and partners by acquiring a big data and high-performance computing system and making this instrument available to the community." This third DARWIN Computing Symposium presented a wide variety of research enabled by the DARWIN machine to the Delaware community. Alongside this, it showcased additional computational and dataenabled research, provided details on accessing DARWIN for University of Delaware (UD) and partner institutions, and facilitated opportunities for forming collaborations among future users at UD and regional partners. In addition to the NSF and the Data Science Institute, the 2022 DARWIN Computing Symposium was sponsored by DELL and Nemours Children's Health. Drs. Jodi Haden-Perilla and Juan Perilla, both of the University of Delaware, served as co-chairs of the 2022 DARWIN Computing Symposium.Item Proceedings of the 2023 DARWIN Computing Symposium(Data Science Institute of the University of Delaware, 2023-02-23) Safronova, Marianna S.; Bagozzi, Benjamin E.; Eigenmann, Rudolf; Jayaraman, Arthi; Totten, William; Wu, Cathy H.The DARWIN Computing Symposium 2023—sponsored by the Data Science Institute of the University of Delaware—was held on February 23, 2023. It represented the fourth event in a series of Symposia motivated by a National Science Foundation (NSF) MRI Award, also known as the Delaware Advanced Research Workforce and Innovation Network (DARWIN). As part of an NSF Major Research Instrumentation award (OAC-1919839), DARWIN has the goal of catalyzing "research and education at the University of Delaware (UD) and partners by acquiring a big data and high-performance computing system and making this instrument available to the community." This fourth DARWIN Computing Symposium presented a wide variety of research enabled by the DARWIN machine to the Delaware community. It also showcased additional computational and data-enabled research, provided perspectives on broadening participation in computational and data-intensive research, and facilitated opportunities for forming collaborations among future users at UD and regional partners. In addition to the NSF and the Data Science Institute, the 2023 DARWIN Computing Symposium was sponsored by AMD, BioCurie, Chemours, and Tech Impact. Dr. Marianna Safronova, Professor of Physics at the Department of Physics and Astronomy, University of Delaware, served as chair of the 2023 DARWIN Computing Symposium.Item Proceedings of the 2023 Delaware Data Science Symposium(Data Science Institute of the University of Delaware, 2023-09-22) Bagozzi, Benjamin E.; Abou Ali, Hanan; Blaustein, Michael; Blinova, Daria; Buler, Jeffrey; Carney, Lynette; Chandrasekaran, Sunita; Davey, Adam; Fleischhacker, Adam; Ostovari, Mina; Peart, Daniel; Smith, Sam; Tawiah, Nii Adjetey; Wu, Cathy H.The 2023 Delaware Data Science Symposium was held on September 22nd with a primary focus on the role of data science in financial technology (FinTech) and health equity. The Symposium was organized by the University of Delaware’s (UD’s) Data Science Institute (DSI) with support from Tech Impact, Dupont, Kendal Corporation, Intellitec Solutions, UD’s Library, Museums, & Press, the UD Career Center, the UD Graduate College, the UD Master of Science in Data Science Program, UD’s Artificial Intelligence Center of Excellence (AICOE), and the DSI. It represented the fourth Delaware Data Science Symposium hosted at the University of Delaware, and the third such Symposium since the DSI’s inception. Altogether, the Symposium saw over 280 registered attendees from the University of Delaware and partner institutions across the Mid-Atlantic and beyond. The 2023 Delaware Data Science Symposium included multiple keynote speakers, a series of initiative & lightning talks, a poster session, a panel on data science-driven equity from healthcare, FinTech, community, and educational perspectives, and a session on UD’s summer 2023 Data Science (DS) + Artificial Intelligence (AI) Hackathon. Alongside these sessions, the Symposium also facilitated two associated satellite events. The first was a September 21st Data Science and Analytics Open House for UD graduate programs focused on data science and analytics. The second was a September 25th workshop on the use of MATLAB for low-code AI.Item Proceedings of the 2024 DARWIN Computing Symposium(Data Science Institute of the University of Delaware, 2024-02-12) Hsu, Tian-Jian; Bagozzi, Benjamin E.; Eigenmann, Rudolf; Jayaraman, Arthi; Totten, William; Wu, Cathy H.; Blaustein, Michael; Blinova, Daria; Carney, Lynette; Huffman, John; Smith, Samantha; Zhang, JiayeThe DARWIN Computing Symposium 2024—sponsored by the Data Science Institute (DSI) of the University of Delaware—was held on February 12, 2024. It represented the fifth event in a series of Symposia motivated by a National Science Foundation (NSF) MRI Award, also known as the Delaware Advanced Research Workforce and Innovation Network (DARWIN). As part of an NSF Major Research Instrumentation award (OAC-1919839), DARWIN focuses on catalyzing "research and education at the University of Delaware (UD) and partners by acquiring a big data and high-performance computing system and making this instrument available to the community." In an effort to identify and advance future computing needs for artificial intelligence, to reduce the overhead for domain scientists utilizing HPC, and to develop regional partnerships, this fifth DARWIN Computing Symposium more specifically featured a panel and a keynote talk, as well as a series of research talks on DARWIN-enabled research, on computational and data-intensive (CDI) research/training needs, and on AI-focused CDI research more generally. These talks highlighted the use of AI in HPC to advance sciences and predictive capabilities with societal relevance across a wide range of domains. A panel discussion then facilitated interactions between research software engineers and domain scientists with an eye towards advancing scientific progress in different disciplines. In addition, 30 poster presentations by students and postdocs highlighted a number of relevant CDI research projects. Alongside the NSF and the Data Science Institute, the 2023 DARWIN Computing Symposium was sponsored by Tech Impact, UD’s Delaware Environmental Institute, UD’s Center for Applied Coastal Research, UD Information Technologies, and the University of Delaware Faculty Senate. Dr. Tian-Jian Hsu, University of Delaware Professor of Civil & Environmental Engineering and Director of the Center for Applied Coastal Research served as chair of the 2024 DARWIN Computing Symposium.Item Protein Ontology (PRO): enhancing and scaling up the representation of protein entities(Oxford University Press, 2016-11-28) Natale, Darren A.; Arighi, Cecilia N.; Blake, Judith A.; Bona, Jonathan; Chen, Chuming; Chen, Sheng-Chih; Christie, Karen R.; Cowart, Julie; D’Eustachio, Peter; Diehl, Alexander D.; Drabkin, Harold J.; Duncan, William D.; Huang, Hongzhan; Ren, Jia; Ross, Karen; Ruttenberg, Alan; Shamovsky, Veronica; Smith, Barry; Wang, Qinghua; Zhang, Jian; El-Sayed, Abdelrahman; Wu, Cathy H.; Darren A. Natale, Cecilia N. Arighi, Judith A. Blake, Jonathan Bona, Chuming Chen, Sheng-Chih Chen, Karen R. Christie, Julie Cowart, Peter D’Eustachio, Alexander D. Diehl, Harold J. Drabkin, William D. Duncan, Hongzhan Huang, Jia Ren, Karen Ross, Alan Ruttenberg, Veronica Shamovsky, Barry Smith, Qinghua Wang, Jian Zhang, Abdelrahman El-Sayed and Cathy H. Wu; Arighi, Cecilia N.; Chen, Chuming; Chen, Sheng-Chih; Cowart, Julie; Huang, Hongzhan; Ren, Jia; Wang, Qinghua; Wu, Cathy H.The Protein Ontology (PRO; http://purl.obolibrary. org/obo/pr) formally defines and describes taxonspecific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and proteincontaining complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translationalmodification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities.Item Protein-protein interaction prediction based on multiple kernels and partial network with linear programming(BioMed Central, 2016-08-01) Huang, Lei; Liao, Li; Wu, Cathy H.; Lei Huang, Li Liao and Cathy H. Wu; Huang, Lei; Liao, Li; Wu, Cathy H.BACKGROUND: Prediction of de novo protein-protein interaction is a critical step toward reconstructing PPI networks, which is a central task in systems biology. Recent computational approaches have shifted from making PPI prediction based on individual pairs and single data source to leveraging complementary information from multiple heterogeneous data sources and partial network structure. However, how to quickly learn weights for heterogeneous data sources remains a challenge. In this work, we developed a method to infer de novo PPIs by combining multiple data sources represented in kernel format and obtaining optimal weights based on random walk over the existing partial networks. RESULTS: Our proposed method utilizes Barker algorithm and the training data to construct a transition matrix which constrains how a random walk would traverse the partial network. Multiple heterogeneous features for the proteins in the network are then combined into the form of weighted kernel fusion, which provides a new "adjacency matrix" for the whole network that may consist of disconnected components but is required to comply with the transition matrix on the training subnetwork. This requirement is met by adjusting the weights to minimize the element-wise difference between the transition matrix and the weighted kernels. The minimization problem is solved by linear programming. The weighted kernel fusion is then transformed to regularized Laplacian (RL) kernel to infer missing or new edges in the PPI network, which can potentially connect the previously disconnected components. CONCLUSIONS: The results on synthetic data demonstrated the soundness and robustness of the proposed algorithms under various conditions. And the results on real data show that the accuracies of PPI prediction for yeast data and human data measured as AUC are increased by up to 19 % and 11 % respectively, as compared to a control method without using optimal weights. Moreover, the weights learned by our method Weight Optimization by Linear Programming (WOLP) are very consistent with that learned by sampling, and can provide insights into the relations between PPIs and various feature kernel, thereby improving PPI prediction even for disconnected PPI networks.Item RNA-Seq Analysis of Abdominal Fat in Genetically Fat and Lean Chickens Highlights a Divergence in Expression of Genes Controlling Adiposity, Hemostasis, and Lipid Metabolism(PLOS (Public Library of Science), 2015-10-07) Resnyk, Christopher W.; Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.; Simon, Jean; Le Bihan-Duval, Elisabeth; Duclos, Michel J.; Cogburn, Larry A.; Christopher W. Resnyk, Chuming Chen, Hongzhan Huang, Cathy H. Wu, Jean Simon, Elisabeth Le Bihan-Duval, Michel J. Duclos, Larry A. Cogburn; Resnyk, Christopher W.; Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.; Cogburn, Larry A.Genetic selection for enhanced growth rate in meat-type chickens (Gallus domesticus) is usually accompanied by excessive adiposity, which has negative impacts on both feed efficiency and carcass quality. Enhanced visceral fatness and several unique features of avian metabolism (i.e., fasting hyperglycemia and insulin insensitivity) mimic overt symptoms of obesity and related metabolic disorders in humans. Elucidation of the genetic and endocrine factors that contribute to excessive visceral fatness in chickens could also advance our understanding of human metabolic diseases. Here, RNA sequencing was used to examine differential gene expression in abdominal fat of genetically fat and lean chickens, which exhibit a 2.8-fold divergence in visceral fatness at 7 wk. Ingenuity Pathway Analysis revealed that many of 1687 differentially expressed genes are associated with hemostasis, endocrine function and metabolic syndrome in mammals. Among the highest expressed genes in abdominal fat, across both genotypes, were 25 differentially expressed genes associated with de novo synthesis and metabolism of lipids. Over-expression of numerous adipogenic and lipogenic genes in the FL chickens suggests that in situ lipogenesis in chickens could make a more substantial contribution to expansion of visceral fat mass than previously recognized. Distinguishing features of the abdominal fat transcriptome in lean chickens were high abundance of multiple hemostatic and vasoactive factors, transporters, and ectopic expression of several hormones/receptors, which could control local vasomotor tone and proteolytic processing of adipokines, hemostatic factors and novel endocrine factors. Over-expression of several thrombogenic genes in abdominal fat of lean chickens is quite opposite to the pro-thrombotic state found in obese humans. Clearly, divergent genetic selection for an extreme (2.5–2.8-fold) difference in visceral fatness provokes a number of novel regulatory responses that govern growth and metabolism of visceral fat in this unique avian model of juvenile-onset obesity and glucose-insulin imbalance.Item RNA-Seq Analysis of Abdominal Fat in Genetically Fat and Lean Chickens Highlights a Divergence in Expression of Genes Controlling Adiposity, Hemostasis, and Lipid Metabolism(Public Library of Science (PLOS), 2015-10-07) Resnyk, Christopher W.; Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.; Simon, Jean; Le Bihan-Duval, Elisabeth; Duclos, Michel J.; Cogburn, Larry A.; Christopher W. Resnyk, Chuming Chen, Hongzhan Huang, Cathy H. Wu, Jean Simon, Elisabeth Le Bihan-Duval, Michel J. Duclos, Larry A. Cogburn; Resnyk, Christopher W.; Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.; Cogburn, Larry A.Genetic selection for enhanced growth rate in meat-type chickens (Gallus domesticus) is usually accompanied by excessive adiposity, which has negative impacts on both feed efficiency and carcass quality. Enhanced visceral fatness and several unique features of avian metabolism (i.e., fasting hyperglycemia and insulin insensitivity) mimic overt symptoms of obesity and related metabolic disorders in humans. Elucidation of the genetic and endocrine factors that contribute to excessive visceral fatness in chickens could also advance our understanding of human metabolic diseases. Here, RNA sequencing was used to examine differential gene expression in abdominal fat of genetically fat and lean chickens, which exhibit a 2.8-fold divergence in visceral fatness at 7 wk. Ingenuity Pathway Analysis revealed that many of 1687 differentially expressed genes are associated with hemostasis, endocrine function and metabolic syndrome in mammals. Among the highest expressed genes in abdominal fat, across both genotypes, were 25 differentially expressed genes associated with de novo synthesis and metabolism of lipids. Over-expression of numerous adipogenic and lipogenic genes in the FL chickens suggests that in situ lipogenesis in chickens could make a more substantial contribution to expansion of visceral fat mass than previously recognized. Distinguishing features of the abdominal fat transcriptome in lean chickens were high abundance of multiple hemostatic and vasoactive factors, transporters, and ectopic expression of several hormones/receptors, which could control local vasomotor tone and proteolytic processing of adipokines, hemostatic factors and novel endocrine factors. Over-expression of several thrombogenic genes in abdominal fat of lean chickens is quite opposite to the pro-thrombotic state found in obese humans. Clearly, divergent genetic selection for an extreme (2.5–2.8-fold) difference in visceral fatness provokes a number of novel regulatory responses that govern growth and metabolism of visceral fat in this unique avian model of juvenile-onset obesity and glucose-insulin imbalance.