An experimental exploration of self-aware systems for exascale architectures

Date
2016
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
High-performance systems are evolving to a point where performance is no longer the sole relevant criterion anymore. Current execution and resource management paradigms are no longer sufficient to ensure correctness and performance. Power requirements are presently driving the co-design of HPC systems, which in turn sets the course for a radical change in how to express the need for scarcer and scarcer resources, as well as, how to manage them. As a result, systems will need to become more introspective and self-aware with respect to performance, energy, and resiliency. To this end, this thesis explores the major hardware requirements that are central to enabling introspection, the types of interfaces and information that will be needed for introspective system software, provides an abstract representation of exascale architectures based on current trends, and implements an exascale simulation framework with built in temperature and power management capabilities. Through this framework, we demonstrate that localized adaptive policies are not sufficient for exascale systems and that instead coordinated hierarchical adaptive policies are need in order to effectively adapt and mitigate oscillation within systems consisting of thousands of independent cores.
Description
Keywords
Citation