Li, Pengyuan2022-03-222022-03-222021https://udspace.udel.edu/handle/19716/30673Biomedical research findings are typically disseminated through publications. To simplify access to domain specific knowledge while supporting the research community, several biomedical databases devote significant effort to manual curation of the literature. The first step in the biocuration process is to identify articles relevant to the specific area on which the database is focused within a large volume of publications -- which is a labor intensive and slow process. Thus, automatically identifying publications that are relevant to a specific topic is one of the fundamental tasks toward expediting the biocuration process and, in turn, biomedical research. ☐ Current methods for categorization of biomedical documents focus on textual contents, typically extracted from the title and the abstract. Notably, images and captions are often used in publications to convey pivotal information about research processes, experiments and results. In this thesis, we explore means for utilizing and integrating image information into biomedical document classification. To do that, we first develop a new and effective system for extracting figures and their captions from biomedical publications. The vast majority of extracted figures are compound images consisting of multiple panels, where each individual panel potentially conveys a different type of information. In order to use the image information from each individual panel, we propose an efficient and effective method to separate those compound images into their constituent panels. Last, we introduce a new biomedical document classification scheme that uses information derived from images, captions, in addition to titles-and-abstracts.Biomedical document analysisBiocuration processImage informationUtilizing image information for biomedical document classificationThesis1304832406https://doi.org/10.58088/jeww-df022022-01-21en