Modeling non-determinism of scientific applications

Author(s)Chapp, Dylan
Date Accessioned2020-11-11T17:10:30Z
Date Available2020-11-11T17:10:30Z
Publication Date2020
SWORD Update2020-09-06T16:04:38Z
AbstractAs the scientific community prepares to deploy an increasingly complex and di- verse set of applications on upcoming exascale platforms, the need for methods to assess reproducibility of simulations and identify the root causes of reproducibility failures in- creases correspondingly. One of the greatest challenges facing reproducibility efforts at exascale is unavoidable application-level non-determinism at the level of inter-process communication. While often necessary to boost performance, use of non-deterministic communication constructs can hamper reproducibility due to the interaction between communication non-determinism and floating-point non-associativity. ☐ In this thesis we address the challenge of non-determinism in scientific appli- cations along three strategic directions. First, we assess the landscape of existing tooling and infrastructure for managing non-determinism via record-and-replay, and in doing so produce evidence suggesting the need for record-and-replay to adapt to communication patterns of non-deterministic applications at exascale. Second, we as- sess the landscape of techniques for alleviating non-determinism’s detrimental effects on numerical reproducibility, and in so doing provide an experimental framework for efficiently compensating for non-determinism based on characteristics of an applica- tion’s floating-point data. Third, we propose and develop a methodology for model- ing communication non-determinism. Our methodology models parallel executions as directed graphs and leverages graph kernels to quantify and characterize run-to-run variations in inter-process communication. To validate our methodology, we present empirical studies showing the utility of graph kernel similarity for quantifying the de- gree of non-determinism present in representative communication patterns. To test the effectiveness of our approach, we present a study on a representative adaptive mesh refinement application demonstrating that our methodology can link runtime mani- festations of communication non-determinism to their root causes in source code, and thus alleviate the burden computational scientists of tracking down potential sources of reproducibility failures in complex code bases.en_US
AdvisorTaufer, Michela
DegreePh.D.
DepartmentUniversity of Delaware, Department of Computer and Information Sciences
DOIhttps://doi.org/10.58088/gdxv-sg39
Unique IdentifierUniversity of Delaware, Department of Computer and Information Sciences
URLhttps://udspace.udel.edu/handle/19716/27969
Languageen
PublisherUniversity of Delawareen_US
URIhttps://login.udel.idm.oclc.org/login?url=https://www.proquest.com/docview/2445588052?accountid=10457
KeywordsGraph kernelsen_US
KeywordsGraph similarityen_US
KeywordsHigh performance computingen_US
KeywordsNon-determinismen_US
TitleModeling non-determinism of scientific applicationsen_US
TypeThesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chapp_udel_0060D_14232.pdf
Size:
13.11 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: