Building a predictive modeling system for sentence classification: a case study using tardive dyskinesia

Bi, Xia

Building a predictive modeling system for sentence classification: a case study using tardive dyskinesia

Author(s)	Bi, Xia
Date Accessioned	2012-12-18T12:29:59Z
Date Available	2012-12-18T12:29:59Z
Publication Date	2012
Abstract	Advances in computational and biological methods have greatly accelerated the pace of scientific discovery and produced a tremendous amount of experimental and computational data in the biomedical domain. Given the wealth of information that are available both in scientific papers and electronic databases, one particular challenge in biomedicine is to detect disease-drug associations and to organize them in a meaningful way that will accelerate pharmacogenetic research. Several text mining tools have been developed to facilitate this purpose. They perform adequately well in identifying facts and entities using on-the-fly search of scientific articles from many different databases; however, they cannot analyze the type of relationship that exist between the objects identified. In this thesis, we propose a novel method to analyze drug-disease relationships using a combination of in-house and open-source tools that exploit the Multinomial Naïve Bayes (MNB) modeling technique. The main motivation behind this thesis work is to assist researchers to quickly identify disease-drug relationships from the biomedical literature using the case study of tardive dyskinesia (TD) and to classify those relationships into specific categories to enable better understanding of various drug effects. We have manually developed and annotated a biomedical training corpus for TD via sentence classification. Using the MNB modeling technique, we generated a learning model and built a predictive classifier system using data preprocessing and filtering algorithms. To assess whether the model would generalize to an independent dataset, we applied the 10-fold cross-validation method to evaluate the model using precision, recall, F-measure, and ROC area. The precision, recall, and F-measure were approximately 88%, and ROC area was over 97%. One particular challenge in sentence classification is the co-existence of contrasting biological observations that cause confusion to the classification model. To address this ambiguity issue, we passed the output data to Metamap to identify and separate distinct biological observations in biomedical text. By further discerning the semantic meaning of biological observations, we classified biomedical sentences into more refined categories, which helped to elucidate various drug effects and proved to be an initial effort toward the sophisticated task of disease-drug relationship extraction.	en_US
Advisor	Wu, Cathy H.
Degree	M.S.
Department	University of Delaware, Department of Bioinformatics and Computational Biology
URL	http://udspace.udel.edu/handle/19716/12034
Publisher	University of Delaware	en_US
dc.subject.lcsh	Tardive dyskinesia.
dc.subject.lcsh	Bayesian statistical decision theory.
Title	Building a predictive modeling system for sentence classification: a case study using tardive dyskinesia	en_US
Type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: XiaBi_Thesis.pdf
Size:: 895.39 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.22 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Master's Theses (Fall 2009 to Present)