Modeling non-determinism of scientific applications

Chapp, Dylan

Modeling non-determinism of scientific applications

Author(s)	Chapp, Dylan
Date Accessioned	2020-11-11T17:10:30Z
Date Available	2020-11-11T17:10:30Z
Publication Date	2020
SWORD Update	2020-09-06T16:04:38Z
Abstract	As the scientific community prepares to deploy an increasingly complex and di- verse set of applications on upcoming exascale platforms, the need for methods to assess reproducibility of simulations and identify the root causes of reproducibility failures in- creases correspondingly. One of the greatest challenges facing reproducibility efforts at exascale is unavoidable application-level non-determinism at the level of inter-process communication. While often necessary to boost performance, use of non-deterministic communication constructs can hamper reproducibility due to the interaction between communication non-determinism and floating-point non-associativity. ☐ In this thesis we address the challenge of non-determinism in scientific appli- cations along three strategic directions. First, we assess the landscape of existing tooling and infrastructure for managing non-determinism via record-and-replay, and in doing so produce evidence suggesting the need for record-and-replay to adapt to communication patterns of non-deterministic applications at exascale. Second, we as- sess the landscape of techniques for alleviating non-determinism’s detrimental effects on numerical reproducibility, and in so doing provide an experimental framework for efficiently compensating for non-determinism based on characteristics of an applica- tion’s floating-point data. Third, we propose and develop a methodology for model- ing communication non-determinism. Our methodology models parallel executions as directed graphs and leverages graph kernels to quantify and characterize run-to-run variations in inter-process communication. To validate our methodology, we present empirical studies showing the utility of graph kernel similarity for quantifying the de- gree of non-determinism present in representative communication patterns. To test the effectiveness of our approach, we present a study on a representative adaptive mesh refinement application demonstrating that our methodology can link runtime mani- festations of communication non-determinism to their root causes in source code, and thus alleviate the burden computational scientists of tracking down potential sources of reproducibility failures in complex code bases.	en_US
Advisor	Taufer, Michela
Degree	Ph.D.
Department	University of Delaware, Department of Computer and Information Sciences
DOI	https://doi.org/10.58088/gdxv-sg39
Unique Identifier	University of Delaware, Department of Computer and Information Sciences
URL	https://udspace.udel.edu/handle/19716/27969
Language	en
Publisher	University of Delaware	en_US
URI	https://login.udel.idm.oclc.org/login?url=https://www.proquest.com/docview/2445588052?accountid=10457
Keywords	Graph kernels	en_US
Keywords	Graph similarity	en_US
Keywords	High performance computing	en_US
Keywords	Non-determinism	en_US
Title	Modeling non-determinism of scientific applications	en_US
Type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Chapp_udel_0060D_14232.pdf
Size:: 13.11 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.22 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Doctoral Dissertations (Winter 2014 to Present)