Information retrieval for reducing manual effort in biomedical and clinical research
University of Delaware
Medical professionals leverage health-related data to address questions and support decision-makings. However, many of these medical tasks require intensive manual effort in identifying useful information in the noisy data. The rapid growth of data is making these tasks more and more costly and time-consuming. In this thesis, we develop effective medical information retrieval (IR) systems to reduce search-related manual work for three representative medical related tasks, namely electronic medical records (EMR) based cohort identification, Medical Subject Headings (MeSH) indexing, and gene ontology annotation (GOA). For cohort identification, we improve the search precision and recall from three aspects: 1) we design a multi-level evidence aggregation strategy for effective merging and scoring of the distributed evidence in EMR; 2) we develop a novel statistical IR model that significantly alleviates two medical language related issues in medical IR; 3) we further enhance the search performance by effectively incorporating domain knowledge into our system. For MeSH indexing and GOA, we demonstrate how to use IR to address specific needs. In particular, we investigate different query formulation methods and explore various ways in which IR work together with other techniques such as Natural Language Processing and Machine Learning.