Methodologies for infographics retrieval

Date
2016
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Information graphics (infographics) are graphical representations of data and knowledge. Such graphics are intended to present complex information effectively and clearly. The number of infographics on popular media, such as newspapers, magazines, and online image sharing websites, has grown tremendously in recent years. While the proliferation of infographics has attracted much attention in the area of image processing, relatively little attention has been dedicated to the retrieval of such graphics. Existing search engines that specialize in infographics still rely on text located close to the graphics (in the multimedia document containing these graphics). However, in many cases, there is no text surrounding these graphics; when there is, nearby text often do not describe the content of the graphics. In this thesis, we addressed the task of infographics retrieval that is ill-served by existing retrieval systems for text documents and pictorial images. Specifically, the methodology of this thesis is based on the content of infographics, which is a composition of their unique structures, the textural descriptions in the graphics, and high-level messages conveyed via communicative signals. The overall pipeline of our methodology is comprised of four parts: a query analysis module, which processes a full-sentence interrogative query to hypothesize the content of ideal graphics for this query; a graphic text expansion module, which expands the sparse text in graphics taking different tactics for different graphic components; two ranking models, a linear mixture model and a tree-based learning-to-rank model; a faceting paradigm, which recommends facets to users for faster browsing of the search space.
Description
Keywords
Citation