Novelty and diversity in search results
Date
2014
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Information retrieval (IR) is the process of obtaining relevant information for a given information need. The concept of relevance and its relation to information needs is of central concern to IR researchers. Until recently, much work in IR settled with a notion of relevance that is topical -- that is, containing information "about" a specified topic -- and in which the relevance of a document in a ranking is independent of the relevance of other documents in the ranking. But such an approach is more likely to produce a ranking with a high degree of redundancy; the amount of novel information available to the user may be minimal as they traverse down a ranked list. In this work, we focus on the novelty and diversity problem that models rele- vance of a document taking into account the inter-document effects in a ranked list and diverse information needs for a given query. Existing approaches to this problem mostly rely on identifying subtopics (disambiguation, facets, or other component parts) of an information need, then estimating a document's relevance independently w.r.t each subtopic. Users are treated as being satisfied by a ranking of documents that covers the space of subtopics as well as covering each individual subtopic sufficiently. We propose a novel approach that models novelty implicitly while retaining the ability to capture other important factors affecting user satisfaction. We formulate a set of hypotheses based on the existing subtopic approach and test them with actual users using a simple conditional preference design: users express a preference for document A or document B given document C. Following this, we introduce a novel triplet framework for collecting such preference judgments and using them to estimate the total utility of a document while taking inter-document effects into account. Finally, a set of utility-based metrics are proposed and validated to measure the effectiveness of a system for the novelty and diversity task.