Browsing by Author "Peng, Yifan"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item miRTex: A Text Mining System for miRNAGene Relation Extraction(PLOS (Public Library of Science), 2015-09-25) Li, Gang; Ross, Karen E.; Arighi, Cecilia N.; Peng, Yifan; Wu, Cathy H.; Vijay-Shanker, K.; Gang Li, Karen E. Ross, Cecilia N. Arighi, Yifan Peng, Cathy H. Wu, K. Vijay-Shanker; Li, Gang; Ross, Karen E.; Arighi, Cecilia N.; Peng, Yifan; Wu, Cathy H.; Vijay-Shanker, K.MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes.Item A study of relation extraction for biomedical text(University of Delaware, 2016) Peng, YifanA crucial area of Biomedical Natural Language Processing is relation extraction, the study of identifying relations between entities. One main challenge of relation extraction is text variations. They hinder pattern-based approaches to encode all patterns necessary for achieving a high recall, and limit the generalizability of machine-learning models especially when the size of training data is small. This thesis exams the representation of sentences for relation extraction. In particular, we are concerned with a suitable level of abstraction, which will improve the performance of the relation extraction systems, and in turn lead to advances in other text-mining fields. This thesis describes three steps along these lines. First, we propose an automatic approach for sentence simplification. It reduces the sentence complexity by detecting various syntactic constructs and generating simplified sentences. Second, we describe a framework to facilitate the development of pattern-based biomedical relation extraction systems. The framework leverages various linguistic theories to semi-automatically generate lexico-syntactic patterns. It also applies sentence simplification and semantic relations to increase the pattern coverage. Finally, we propose a structured representation, called Extended Dependency Graph (EDG). It provides an abstract representation accounting for textual variations, by not only considering syntactic dependencies between words in a sentence, but also utilizing information beyond syntax to capture dependencies. In each of these steps, we conduct experiments to evaluate the efficacy of the ideas. The results (1) show that various text-mining approaches can benefit from sentence simplification, (2) demonstrate that we can create state-of-the-art pattern-based systems using the framework to extract different types of relations, and (3) validate the utility of EDG in both pattern-based and machine-learning relation extraction systems.