Biomedical relation extraction with reduced manual effort

dc.contributor.authorLi, Gang
dc.date.accessioned2018-09-17T12:15:03Z
dc.date.available2018-09-17T12:15:03Z
dc.date.issued2018
dc.date.updated2018-07-27T13:03:50Z
dc.description.abstractBiomedical relation extraction is an critical text-mining task that concerns automatic extraction of related bio-entities in text. Rule-based and machine learning methods are two main approaches for relation extraction. While these two methods can be used to develop high-performance relation extraction system, considerable amount of manual effort is required by both methods in different phases of the system development. This hinders fast application of both methods to extract new types of relations. ☐ This dissertation focuses on developing techniques to assist the development of biomedical relation extraction systems using rule-based and machine learning methods. For rule-based systems, one main component requiring manual effort is pattern design. Domain experts often need to examine considerable amount of documents and exhaustively collect every pattern that can be used to extract relations. We leverage various linguistic knowledge to automatically generate a comprehensive set of patterns. Our first approach is instantiated to develop miRTex, a system that extracts three kinds of miRNA-gene relations that regulate a wide range of biological processes and are involved with diseases. Only a small number of triggers and rules are needed to achieve the state-of-the-art performance. Our second approach is to translate the ideas in Lexicalized Tree Adjoining Grammar to dependency graph for pattern generation, and adopt Extended Dependency Graph as an abstract sentence representation. This approach is applied to extract five type of post-translational modifications, a class of relations that plays an important role in cellular functions. Evaluations on BioNLP 2011 EPI task show that the resulting system achieves state-of-the-art performance. ☐ For machine learning systems, a sizable training corpus is needed to train the extraction model, while the annotation of the corpus is time- and labor-intensive. We adopt distant supervision in two ways. Our first contribution is to develop noise reduction techniques to improve the data quality of the automatically generated large training set, leading to improvement over existing results for distantly supervised biomedical relation extraction. Secondly, we employ distant supervision in conjunction with human-labeled data and deep neural networks to achieve state-of-the-art performance on some benchmark relation extraction tasks.en_US
dc.description.advisorShanker, Vijay K.
dc.description.advisorWu, Cathy H.
dc.description.degreePh.D.
dc.description.departmentUniversity of Delaware, Department of Computer and Information Sciences
dc.identifier.doihttps://doi.org/10.58088/4cgk-n623
dc.identifier.unique1052612895
dc.identifier.urihttp://udspace.udel.edu/handle/19716/23793
dc.language.rfc3066en
dc.publisherUniversity of Delawareen_US
dc.relation.urihttps://search.proquest.com/docview/2089996568?accountid=10457
dc.subjectApplied sciencesen_US
dc.subjectBioNLPen_US
dc.subjectBiomedical text miningen_US
dc.subjectDeep learningen_US
dc.subjectDistant supervisionen_US
dc.subjectPattern generationen_US
dc.subjectRelation extractionen_US
dc.titleBiomedical relation extraction with reduced manual efforten_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Li_udel_0060D_13327.pdf
Size:
2.77 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: