DiMeX: A Text Mining System for Mutation- Disease Association Extraction
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Public Library of Science
Abstract
The number of published articles describing associations between mutations and diseases
is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations
into public knowledge bases, but manual curation slows down the growth of such
databases. We have addressed this problem by developing a text-mining system (DiMeX)
to extract mutation to disease associations from publication abstracts. DiMeX consists of a
series of natural language processing modules that preprocess input text and apply syntactic
and semantic patterns to extract mutation-disease associations. DiMeX achieves high
precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different
datasets for mutation-disease associations. DiMeX includes a separate component that
extracts mutation mentions in text and associates them with genes. This component has
been also evaluated on different datasets and shown to achieve state-of-the-art performance.
The results indicate that our system outperforms the existing mutation-disease
association tools, addressing the low precision problems suffered by most approaches.
DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease
associations, as well as other relevant information including patient/cohort size and population
data. The results are stored in a database that can be queried and downloaded at http://
biotm.cis.udel.edu/dimex/.We conclude that this high-throughput text-mining approach has
the potential to significantly assist researchers and curators to enrich mutation databases.
Description
Publisher's PDF
Keywords
Citation
Mahmood ASMA, Wu T-J, Mazumder R, Vijay-Shanker K (2016) DiMeX: A Text Mining System for Mutation-Disease Association Extraction. PLoS ONE 11(4): e0152725. doi:10.1371/journal. pone.0152725
