Gupta, Samir2021-11-172021-11-172021https://udspace.udel.edu/handle/19716/29411Biological entities such as genes, proteins, and microRNAs are critical players in various biological processes and diseases. The role of these entities on biological processes and diseases forms a significant part of biomedical knowledge bases. However, a large portion of this information is buried in scientific literature as unstructured text. This work is motivated by our belief that the development of relation extraction systems that capture the roles of such entities on different processes and diseases from literature is important and much needed. We hypothesize that connections between biological entities and concepts as stated in text can be captured by the extraction of a small number of relations, which we call CAIR relations: Connections through Association, Involvement, and Regulation. ☐ We have developed a general relation extraction framework that reduces the effort required for developing individual relation extraction (RE) systems. This framework is based on a structured representation called Extended Dependency Graph (EDG), which utilizes syntactic dependencies and information beyond syntax to capture thematic dependencies. Based on this framework, we developed a general CAIR relation extraction system to connect a bio-entity to associated concepts. To demonstrate the wide applicability of CAIR relations and the framework, we have developed several RE systems and text-mining applications including miRiaD, a tool to extract the role of microRNAs in diseases, and Phos2X, a tool to extract the functional impact of protein phosphorylation. Additionally, as a continuation of miRiaD development, we also developed DEXTER, a tool to extract microRNA’s differential expression level information in diseases, which covers a different aspect of microRNA-disease associations. Such differential expression statements are stated through comparative sentences, comparing expression levels in two different samples. Thus, we have developed a general system to identify comparison sentences and extract the various components (compared aspect, compared entities/scenarios, and scale of the comparison). Additionally, we extended DEXTER to also extract gene expression information. All the tools we have developed have been evaluated by comparing with human annotations and show high precision and recall.BioNLPmicroRNANatural Language ProcessingRelation ExtractionText miningExtraction of knowledge for micrornas and genes: extracting connections through association, involvement, and regulationThesis1285525614https://doi.org/10.58088/xn3m-v8202021-08-09en