Compile-time automatic parallelization of subscripted subscripts using recurrence analysis
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
The performance of scientific applications can be greatly improved by taking advantage of the many cores available on today's multi-core architectures. To this end, the independent parts of the application code need to be executed simultaneously or in parallel on the various cores. In order to allow for simultaneous execution of applications, a number of parallelization schemes were introduced. One such scheme is OpenMP, a parallel programming paradigm for shared memory architectures. However, the introduction of correct and optimal OpenMP parallelization in applications is not always a simple task. To help ease this process and improve user productivity, automatic parallelizers were developed that can analyze and automatically parallelize applications. ☐ A number of scientific applications make use of data structures such as arrays, graphs, trees and grids to perform computations on sparse, unstructured input data. The use of such data structures introduces unpredictable memory access patterns which complicates the process of optimizing and parallelizing the application codes. A class of such dynamic, irregular access patterns are subscripted subscript patterns wherein, an array value appears in the subscript of another array in a for-loop, e.g. a[b[i]] with cross-iteration write accesses to the host array (array a). Parallelizing subscripted subscript patterns at compile-time has long been a challenge for automatic parallelizers. The principal contribution of this dissertation is a compile-time algorithm that can analyze the input program and gather information about the subscript array such as a property or range of possible values, which is sufficient to automatically parallelize subscripted subscript loops in a class of irregular applications. Other key contributions of this dissertation are -- a suite of specialized benchmarks comprising subscripted subscript patterns referred to as the COMPASS suite and an extended data dependence test that is capable of performing dependence testing in the presence of subscripted subscripts. ☐ This dissertation first discusses the results of an empirical study of the various subscripted subscript patterns observed in scientific applications and presents the key subscript array properties that enable parallelization of the patterns. The COMPASS suite is a consequence of this empirical study. The dissertation then introduces and describes in detail the array analysis algorithm and the novel concepts that are at the core of the algorithm for analyzing the input program and determining monotonicity, the most pervasive subscript array property. The proposed algorithm is implemented in the Cetus automatic parallelizer. Experimental results show that our algorithm can substantially improve the performance of 83.33% of the evaluated benchmarks, 33.33% more than possible with state-of-the-art automatic parallelization techniques. ☐ The dissertation then presents the Extended Range Test (ERT) that is capable of performing dependence testing in the presence of subscripted subscript expressions. The test uses the information about the subscript array gathered by the array analysis algorithm to prove non-overlap of the accesses made to the host array. The test is highly effective in handling complex subscript expressions comprising index arrays that appear in the COMPASS benchmarks. The combination of the ERT and the array analysis algorithm is responsible for delivering the performance gains for the evaluated benchmarks. ☐
Description
Keywords
Automatic parallelization, Compilers, Parallel computing, Recurrence analysis, Subscripted subscripts