Novelty and diversity in search results

Ravichandran, Praveen
Journal Title
Journal ISSN
Volume Title
University of Delaware
Information retrieval (IR) is the process of obtaining relevant information for a given information need. The concept of relevance and its relation to information needs is of central concern to IR researchers. Until recently, much work in IR settled with a notion of relevance that is topical -- that is, containing information "about" a specified topic -- and in which the relevance of a document in a ranking is independent of the relevance of other documents in the ranking. But such an approach is more likely to produce a ranking with a high degree of redundancy; the amount of novel information available to the user may be minimal as they traverse down a ranked list. In this work, we focus on the novelty and diversity problem that models rele- vance of a document taking into account the inter-document effects in a ranked list and diverse information needs for a given query. Existing approaches to this problem mostly rely on identifying subtopics (disambiguation, facets, or other component parts) of an information need, then estimating a document's relevance independently w.r.t each subtopic. Users are treated as being satisfied by a ranking of documents that covers the space of subtopics as well as covering each individual subtopic sufficiently. We propose a novel approach that models novelty implicitly while retaining the ability to capture other important factors affecting user satisfaction. We formulate a set of hypotheses based on the existing subtopic approach and test them with actual users using a simple conditional preference design: users express a preference for document A or document B given document C. Following this, we introduce a novel triplet framework for collecting such preference judgments and using them to estimate the total utility of a document while taking inter-document effects into account. Finally, a set of utility-based metrics are proposed and validated to measure the effectiveness of a system for the novelty and diversity task.