Utilizing image and caption information for biomedical document classification

Li, Pengyuan; Jiang, Xiangying; Zhang, Gongbo; Trabucco, Juan Trelles; Raciti, Daniela; Smith, Cynthia; Ringwald, Martin; Marai, G. Elisabeta; Arighi, Cecilia; Shatkay, Hagit

Utilizing image and caption information for biomedical document classification

Author(s)	Li, Pengyuan
Author(s)	Jiang, Xiangying
Author(s)	Zhang, Gongbo
Author(s)	Trabucco, Juan Trelles
Author(s)	Raciti, Daniela
Author(s)	Smith, Cynthia
Author(s)	Ringwald, Martin
Author(s)	Marai, G. Elisabeta
Author(s)	Arighi, Cecilia
Author(s)	Shatkay, Hagit
Date Accessioned	2022-01-19T20:05:39Z
Date Available	2022-01-19T20:05:39Z
Publication Date	2021-07-12
Description	This article was originally published in Bioinformatics. The version of record is available at: https://doi.org/10.1093/bioinformatics/btab331	en_US
Abstract	Motivation: Biomedical research findings are typically disseminated through publications. To simplify access to domain-specific knowledge while supporting the research community, several biomedical databases devote significant effort to manual curation of the literature—a labor intensive process. The first step toward biocuration requires identifying articles relevant to the specific area on which the database focuses. Thus, automatically identifying publications relevant to a specific topic within a large volume of publications is an important task toward expediting the biocuration process and, in turn, biomedical research. Current methods focus on textual contents, typically extracted from the title-and-abstract. Notably, images and captions are often used in publications to convey pivotal evidence about processes, experiments and results. Results: We present a new document classification scheme, using both image and caption information, in addition to titles-and-abstracts. To use the image information, we introduce a new image representation, namely Figure-word, based on class labels of subfigures. We use word embeddings for representing captions and titles-and-abstracts. To utilize all three types of information, we introduce two information integration methods. The first combines Figure-words and textual features obtained from captions and titles-and-abstracts into a single larger vector for document representation; the second employs a meta-classification scheme. Our experiments and results demonstrate the usefulness of the newly proposed Figure-words for representing images. Moreover, the results showcase the value of Figure-words, captions and titles-and-abstracts in providing complementary information for document classification; these three sources of information when combined, lead to an overall improved classification performance. Availability and implementation: Source code and the list of PMIDs of the publications in our datasets are available upon request.	en_US
Sponsor	This work was partially supported by National Institutes of Health (NIH)/National Library of Medicine (NLM) awards [R56LM011354A and R01LM012527]; NIH/National Institute of Child Health and Human Development (NICHD) award [P41 HD062499 to M.R.].	en_US
Citation	Pengyuan Li, Xiangying Jiang, Gongbo Zhang, Juan Trelles Trabucco, Daniela Raciti, Cynthia Smith, Martin Ringwald, G Elisabeta Marai, Cecilia Arighi, Hagit Shatkay, Utilizing image and caption information for biomedical document classification, Bioinformatics, Volume 37, Issue Supplement_1, July 2021, Pages i468–i476, https://doi.org/10.1093/bioinformatics/btab331	en_US
ISSN	1460-2059
URL	https://udspace.udel.edu/handle/19716/30045
Language	en_US	en_US
Publisher	Bioinformatics	en_US
Title	Utilizing image and caption information for biomedical document classification	en_US
Type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Utilizing image and caption information for biomedical document classification.pdf
Size:: 2.91 MB
Format:: Adobe Portable Document Format
Description:: Main article

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.22 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Publications