Browsing by Author "Chapp, Dylan"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Modeling non-determinism of scientific applications(University of Delaware, 2020) Chapp, DylanAs the scientific community prepares to deploy an increasingly complex and di- verse set of applications on upcoming exascale platforms, the need for methods to assess reproducibility of simulations and identify the root causes of reproducibility failures in- creases correspondingly. One of the greatest challenges facing reproducibility efforts at exascale is unavoidable application-level non-determinism at the level of inter-process communication. While often necessary to boost performance, use of non-deterministic communication constructs can hamper reproducibility due to the interaction between communication non-determinism and floating-point non-associativity. ☐ In this thesis we address the challenge of non-determinism in scientific appli- cations along three strategic directions. First, we assess the landscape of existing tooling and infrastructure for managing non-determinism via record-and-replay, and in doing so produce evidence suggesting the need for record-and-replay to adapt to communication patterns of non-deterministic applications at exascale. Second, we as- sess the landscape of techniques for alleviating non-determinism’s detrimental effects on numerical reproducibility, and in so doing provide an experimental framework for efficiently compensating for non-determinism based on characteristics of an applica- tion’s floating-point data. Third, we propose and develop a methodology for model- ing communication non-determinism. Our methodology models parallel executions as directed graphs and leverages graph kernels to quantify and characterize run-to-run variations in inter-process communication. To validate our methodology, we present empirical studies showing the utility of graph kernel similarity for quantifying the de- gree of non-determinism present in representative communication patterns. To test the effectiveness of our approach, we present a study on a representative adaptive mesh refinement application demonstrating that our methodology can link runtime mani- festations of communication non-determinism to their root causes in source code, and thus alleviate the burden computational scientists of tracking down potential sources of reproducibility failures in complex code bases.Item Study of the impact of non-determinism on numerical reproducibility and debugging at the exascale(University of Delaware, 2017) Chapp, DylanNon-determinism in high performance scientific applications has severe detri- mental impacts for both numerical reproducibility and accuracy, and debugging. As scientific simulations are migrated to extreme-scale platforms, the increase in platform concurrency and the attendant increase in non-determinism is likely to exacerbate both of these problems. In this thesis, we address the dual challenges of non-determinism’s impact on numerical reproducibility and on debugging. ☐ To address the numerical challenge, our work investigates the power of mathe- matical methods to mitigate error propagation at the exascale. We focus on floating- point error accumulation over global summations where enforcing any reduction order is expensive or impossible. We model parallel summations with reduction trees and identify those parameters that can be used to estimate the reduction’s sensitivity to variability in the reduction tree. We assess the impact of these parameters on the abil- ity of different reduction methods to successfully mitigate errors. Our results illustrate the pressing need for intelligent runtime selection of reduction operators that ensure a given degree of reproducible accuracy. ☐ To address the debugging challenge, our work examines the impact of logical clock ticking policies on the Clock-Delta Compression record-and-replay technique. We assess three logical clock ticking policies in terms of the number of out-of-order messages that result during recording executions under these policies. We examine the performance of Clock-Delta Compression when using the three ticking policies in four distinct application scenarios to probe the impact of floating-point workload and communication intensity on recording performance. Our results illustrate the pressing need for fine-grained logical clock ticking policies that reduce then out-of-order message rate of the Clock-Delta Compression record-and-replay technique.