A novel similarity-search method for mathematical content in LaTeX markup and its implementation

Date
2015
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Mathematical content are widely contained by digital document, but major search engines fail to offer a way to search those structural content effectively, because traditional IR methods are deficient to capture some important aspects of math language. In this paper, we propose a similarity-search method for LaTeX math expressions, trying to provide a new idea to better search math content. Our approach uses an intermediate tree representation to capture structural information of math expression, and based on a previous idea, we index math expressions by tree leaf-root paths. A search method to limit search set for possible sub-expression isomorphism is provided. We rank search results by a few intuitive similarity metres from both structural and symbolic points of view. We also build our own proof-of-concept prototype search engine to demonstrate these ideas, and thus are able to present some evaluation results through this paper. Experiment shows these proposed measurements can advance effectiveness with respect to our baseline search method.
Description
Keywords
Citation