A case for asynchronous many task runtimes: a modeling approach for high performance computing and Big Data analytics

Suetterlein, Joshua Daniel
Journal Title
Journal ISSN
Volume Title
University of Delaware
High Performance Computing (HPC) facilitates scientific exploration by evaluating increasingly complex models and simulations using large parallel computing systems. With the breakdown of Dennard scaling however, the landscape of HPC has radically changed over the past decade giving rise to unprecedented parallelism, shorter pipelines, deep memory hierarchies, and novel power management techniques. The effects of these changes have rippled beyond hardware requiring the attention of the entire stack. To efficiently utilize the underlying hardware, system software must evolve to become more agile, capable of handling uncertain operational latencies. ☐ Fine-grain event driven execution models, as exemplified by Asynchronous Many Task (AMT) runtimes, have been proposed as a flexible and efficient solution capable of utilizing the underlying hardware. Fine-grain parallelism however, comes at the potentially steep cost of runtime overhead which can easily negate its performance benefits. Moreover, AMTs lack the tools and methodologies required to adequately predict and evaluate their performance for current and future systems. ☐ In order to vet fine-grain runtimes as an efficient and scalable solution, this thesis constructs an upper bound performance model and applies it to an exemplar runtime system, Performance Open Community Runtime (P-OCR). Building upon this model, a methodology is provided to determine the appropriate task granularity and predict the scalability of an application for a given system. These tools provide a framework to evaluate fine-grain software for HPC, and serve as a stepping stone to explore the applicability of AMTs to emerging technologies. ☐ One such field recently described as the fourth paradigm of science is Big Data analytics. Big Data uses large clusters to turn enormous volumes of data into actionable knowledge. While HPC and Big Data seem to approach knowledge discovery from two disparate angles, the growing size of HPC workloads and the technical challenges posed by the underlying hardware each faces place them on a converging path. As a case study of AMTs, this thesis demonstrates a streaming, fine-grain, Big Data solution inspired by MapReduce in P-OCR. Furthermore, leveraging the previously mentioned performance model and a newly constructed introspection framework, this thesis presents how runtime adaption is used to scale the granularity of tasks to mitigate runtime overhead. ☐ The contributions of this thesis are the following: • Extends the Roofline model and applies it to software components to provide a quasi-analytical performance model for Asynchronous Many-Task runtimes (AMT) • Leveraging the extended Roofline model, provides a methodology for determining the appropriated granularity of an AMT task and the scalability of HPC applications • MapReduce inspired Streaming Big Data extensions for fine grain AMTs • A novel model-driven introspection and adaption framework capable of minimizing runtime overhead
Applied sciences, Asynchronous many task runtime, Big Data, Fine-grain task, Roofline