Analysis of high performance sparse matrix-vector multiplication for small finite fields

Date
2020
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
This thesis explores the intricacies of obtaining high performance sparse matrix-vector multiplication on modern hardware, with an emphasis on operating over data from small finite fields. We develop and present novel adaptations of classical, fast matrix multiplication algorithms over these fields and apply them in a sparse setting. Further, we compare these novel formats and algorithms to a wide array of standard sparse matrix formats. In particular, we use modern code analysis tools to show how data layouts, vector/panel widths, and choice of compiler all substantially influence the efficiency of the underlying arithmetic of these various sparse formats. These analyses are performed in a data-agnostic manner, meaning we focus solely on the efficiency of the arithmetic and not on higher-level aspects of performance such as cache access patterns. In particular, we analyze the assembly code produced by compilers, removing matrix-specific intangibles from the discussion of format. These are still important considerations when considering any specific matrix, but they make comparing general formats to one another difficult, without exhaustive benchmarking. These results show the theoretical peak arithmetic performance, which we discuss in this abstract, analytic perspective. We see similar trends in synthetic performance benchmarks. Ultimately, we show that the Method of the Four Russians can be directly adapted to sparse matrix-panel (a matrix with relatively few columns) multiplication and that a custom, high-performance variant can achieve high performance in sparse matrix-vector multiplication, with potential to perform better in real-world matrices.
Description
Keywords
Linear algebra, Matrix-vector multiplication, Vectorization
Citation