Modeling non-determinism of scientific applications
Author(s) | Chapp, Dylan | |
Date Accessioned | 2020-11-11T17:10:30Z | |
Date Available | 2020-11-11T17:10:30Z | |
Publication Date | 2020 | |
SWORD Update | 2020-09-06T16:04:38Z | |
Abstract | As the scientific community prepares to deploy an increasingly complex and di- verse set of applications on upcoming exascale platforms, the need for methods to assess reproducibility of simulations and identify the root causes of reproducibility failures in- creases correspondingly. One of the greatest challenges facing reproducibility efforts at exascale is unavoidable application-level non-determinism at the level of inter-process communication. While often necessary to boost performance, use of non-deterministic communication constructs can hamper reproducibility due to the interaction between communication non-determinism and floating-point non-associativity. ☐ In this thesis we address the challenge of non-determinism in scientific appli- cations along three strategic directions. First, we assess the landscape of existing tooling and infrastructure for managing non-determinism via record-and-replay, and in doing so produce evidence suggesting the need for record-and-replay to adapt to communication patterns of non-deterministic applications at exascale. Second, we as- sess the landscape of techniques for alleviating non-determinism’s detrimental effects on numerical reproducibility, and in so doing provide an experimental framework for efficiently compensating for non-determinism based on characteristics of an applica- tion’s floating-point data. Third, we propose and develop a methodology for model- ing communication non-determinism. Our methodology models parallel executions as directed graphs and leverages graph kernels to quantify and characterize run-to-run variations in inter-process communication. To validate our methodology, we present empirical studies showing the utility of graph kernel similarity for quantifying the de- gree of non-determinism present in representative communication patterns. To test the effectiveness of our approach, we present a study on a representative adaptive mesh refinement application demonstrating that our methodology can link runtime mani- festations of communication non-determinism to their root causes in source code, and thus alleviate the burden computational scientists of tracking down potential sources of reproducibility failures in complex code bases. | en_US |
Advisor | Taufer, Michela | |
Degree | Ph.D. | |
Department | University of Delaware, Department of Computer and Information Sciences | |
DOI | https://doi.org/10.58088/gdxv-sg39 | |
Unique Identifier | University of Delaware, Department of Computer and Information Sciences | |
URL | https://udspace.udel.edu/handle/19716/27969 | |
Language | en | |
Publisher | University of Delaware | en_US |
URI | https://login.udel.idm.oclc.org/login?url=https://www.proquest.com/docview/2445588052?accountid=10457 | |
Keywords | Graph kernels | en_US |
Keywords | Graph similarity | en_US |
Keywords | High performance computing | en_US |
Keywords | Non-determinism | en_US |
Title | Modeling non-determinism of scientific applications | en_US |
Type | Thesis | en_US |