Building a predictive modeling system for sentence classification: a case study using tardive dyskinesia

Author(s)Bi, Xia
Date Accessioned2012-12-18T12:29:59Z
Date Available2012-12-18T12:29:59Z
Publication Date2012
AbstractAdvances in computational and biological methods have greatly accelerated the pace of scientific discovery and produced a tremendous amount of experimental and computational data in the biomedical domain. Given the wealth of information that are available both in scientific papers and electronic databases, one particular challenge in biomedicine is to detect disease-drug associations and to organize them in a meaningful way that will accelerate pharmacogenetic research. Several text mining tools have been developed to facilitate this purpose. They perform adequately well in identifying facts and entities using on-the-fly search of scientific articles from many different databases; however, they cannot analyze the type of relationship that exist between the objects identified. In this thesis, we propose a novel method to analyze drug-disease relationships using a combination of in-house and open-source tools that exploit the Multinomial Naïve Bayes (MNB) modeling technique. The main motivation behind this thesis work is to assist researchers to quickly identify disease-drug relationships from the biomedical literature using the case study of tardive dyskinesia (TD) and to classify those relationships into specific categories to enable better understanding of various drug effects. We have manually developed and annotated a biomedical training corpus for TD via sentence classification. Using the MNB modeling technique, we generated a learning model and built a predictive classifier system using data preprocessing and filtering algorithms. To assess whether the model would generalize to an independent dataset, we applied the 10-fold cross-validation method to evaluate the model using precision, recall, F-measure, and ROC area. The precision, recall, and F-measure were approximately 88%, and ROC area was over 97%. One particular challenge in sentence classification is the co-existence of contrasting biological observations that cause confusion to the classification model. To address this ambiguity issue, we passed the output data to Metamap to identify and separate distinct biological observations in biomedical text. By further discerning the semantic meaning of biological observations, we classified biomedical sentences into more refined categories, which helped to elucidate various drug effects and proved to be an initial effort toward the sophisticated task of disease-drug relationship extraction.en_US
AdvisorWu, Cathy H.
DegreeM.S.
DepartmentUniversity of Delaware, Department of Bioinformatics and Computational Biology
URLhttp://udspace.udel.edu/handle/19716/12034
PublisherUniversity of Delawareen_US
dc.subject.lcshTardive dyskinesia.
dc.subject.lcshBayesian statistical decision theory.
TitleBuilding a predictive modeling system for sentence classification: a case study using tardive dyskinesiaen_US
TypeThesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
XiaBi_Thesis.pdf
Size:
895.39 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: