Text mining of mutations and their impact from biomedical literature
Date
2018
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
The increasing amount of research focusing on genetic mutations has triggered a rapid growth in the number of published articles describing mutations and their effect on diseases, drug responses and protein functionalities. With the advent of precision medicine, which aims at identifying targeted therapies that have maximal efficacy for individual patients, there is a pressing need to gather such mutational information from text into public knowledge bases. But manual curation slows down the growth of such databases. We have applied natural language processing (NLP) techniques to locate and extract mutational information from text that will assist curators and researchers. In particular, in this dissertation, we have addressed the following tasks: mutation detection, mutation to disease association, mutation impact on drug responses and impact of mutations on protein-protein interactions from research literature. ☐ We have developed a system, MeX, to detect a wide range of mutation mentions from text. Evaluations on several publicly available corpora exhibit that we have achieved state-of-the-art performance in mutation detection. The mutation detector also applies a novel algorithm to associate mutations with genes. We have developed a system, DiMeX, which finds the association between mutations and diseases from abstracts of published articles. Our system outperformed the current state-of-the-art when evaluated on multiple corpora. We have developed a system, eGARD, to identify the impact of genomic anomalies on drug responses. Evaluations showed high performance measures from eGARD that will significantly reduce manual curation time. Finally, we have developed a text mining system to extract mutation impact on protein-protein interaction. This type of information will provide further insight into how mutations affect protein functions, and thereby play a role in the development and progression of diseases. Our system outperformed the current state-of-the-art approaches for the task. To enable easier access to data and make it available to computational bioinformatics tools, we have applied DiMeX and eGARD on Medline-scale and stored the results in databases.