Biomedical relation extraction with reduced manual effort

Date
2018
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Biomedical relation extraction is an critical text-mining task that concerns automatic extraction of related bio-entities in text. Rule-based and machine learning methods are two main approaches for relation extraction. While these two methods can be used to develop high-performance relation extraction system, considerable amount of manual effort is required by both methods in different phases of the system development. This hinders fast application of both methods to extract new types of relations. ☐ This dissertation focuses on developing techniques to assist the development of biomedical relation extraction systems using rule-based and machine learning methods. For rule-based systems, one main component requiring manual effort is pattern design. Domain experts often need to examine considerable amount of documents and exhaustively collect every pattern that can be used to extract relations. We leverage various linguistic knowledge to automatically generate a comprehensive set of patterns. Our first approach is instantiated to develop miRTex, a system that extracts three kinds of miRNA-gene relations that regulate a wide range of biological processes and are involved with diseases. Only a small number of triggers and rules are needed to achieve the state-of-the-art performance. Our second approach is to translate the ideas in Lexicalized Tree Adjoining Grammar to dependency graph for pattern generation, and adopt Extended Dependency Graph as an abstract sentence representation. This approach is applied to extract five type of post-translational modifications, a class of relations that plays an important role in cellular functions. Evaluations on BioNLP 2011 EPI task show that the resulting system achieves state-of-the-art performance. ☐ For machine learning systems, a sizable training corpus is needed to train the extraction model, while the annotation of the corpus is time- and labor-intensive. We adopt distant supervision in two ways. Our first contribution is to develop noise reduction techniques to improve the data quality of the automatically generated large training set, leading to improvement over existing results for distantly supervised biomedical relation extraction. Secondly, we employ distant supervision in conjunction with human-labeled data and deep neural networks to achieve state-of-the-art performance on some benchmark relation extraction tasks.
Description
Keywords
Applied sciences, BioNLP, Biomedical text mining, Deep learning, Distant supervision, Pattern generation, Relation extraction
Citation