Exploring hierarchical parallelism in directive-based models for efficient GPU execution

Date
2023
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
The advent of general-purpose GPUs in parallel computing brings several new languages, tools and programming models. One popular way to program GPUs is using high-level directives in common languages such as C and Fortran that provides an easy to understand and familiar environment to the programmer. However, unlike dedicated GPU languages, these directive-based models often struggle to express low-level features of the GPU hardware as well as introduce performance overheads in the form of complex runtime libraries. Two such model are OpenMP, which has been a staple for parallel CPU programming for several decades and has supported GPUs as of version 4.0, and OpenACC, released in 2012 was designed by many developers from the OpenMP community to bring a standard that supports code offloading from the beginning. ☐ This work introduces two impactful projects within these programming models. Firstly, the MURaM project is developed by an interdisciplinary team of domain scientists and HPC research software engineers who seek to examine the strengths and limitations of these directive-based models, as well as the difficulties faced by application developers when bringing large code bases to advanced parallel hardware. Secondly, the LLVM/OpenMP SIMD project aims to fill an important implementation hole within the LLVM compiler to fully utilize the parallelism available to GPUs. OpenMP, like other similar models, allows for three distinct levels of hierarchical thread parallelism which matches the GPUs hardware layout. However, due to the implementation complexity, OpenMP compilers often omit user control of the middle level of parallelism. This greatly limits the achievable performance for codes that do utilize all three explicit layers and often requires restructuring of these applications as a workaround. In this proposal we outline our design and prototype of this middle level of parallelism through OpenMP’s “simd” directive using the open-source compiler LLVM and its OpenMP GPU runtime, which includes both a CPU-centric model for OpenMP conformability, and a GPU-centric optimized model for higher performance.
Description
Keywords
Compilers, Directive-based programming, GPU hardware, Programming models, CPU programming, GPU languages
Citation