Technology trend-driven reliable embedded system design

Date
2021
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Embedded systems are widely adopted in modern human society to perform a broad range of tasks including many safety-critical, medical, and banking applications. Their reliability therefore has become a major design concern. However, developing an efficient and cost-effective reliable design for embedded systems is a challenging multi-dimensional optimization task due to their intrinsic resource limitations and energy-constraints. What makes this task even more complicated are the constant challenges raised by new technology trends. In particular, modern embedded systems are facing three prominent technology trends: dramatic technology scaling, emerging non-volatile memory technologies, and emerging complex applications. ☐ This dissertation proposes a number of cost-effective and software-based techniques to address major reliability challenges raised by these technology trends. First, dramatic technology scaling will end up elevating the probability of hardware faults, making fault recovery overhead critical. The first technique aims to reduce such overhead by selectively executing only the instructions that are necessary to recover from a detected fault. The second work presents a tool, designed to provide comprehensive fault assessment of embedded systems and applications in the face of elevated fault rates. Meanwhile, this dissertation also examines new fault types brought by emerging technology. Specifically, the third work aims to tolerate disturbance errors caused by read operations in Spin-transfer torque magnetic random-access memory (STT-RAM), the most promising emerging on-chip memory technology. Last but not the least, this dissertation also considers the challenges brought by emerging complex applications, specifically, embedded machine learning. The last two techniques study the impact of memory faults that may occur in either traditional Dynamic Random Access Memory (DRAM) or emerging Non-Volatile Memory (NVM) of the embedded accelerators on the performance of accuracy of emerging neural network applications, aiming to mitigate the potential accuracy drop caused by these faults. ☐ The reliability enhancement techniques introduced in this dissertation all follow a static-dynamic collaborative design philosophy to met the desired reliability constraints, which are measured with fault coverage for traditional applications and with accuracy for emerging neural network applications. Specifically, a series of static, compile-time optimizations are applied to the target embedded applications before their deployment on the device, so that the reliability requirements are met at run-time within negligible performance, power, and hardware cost. Compared to traditional fault-tolerance solutions that require non-trivial levels of redundancy, this cost-effective design philosophy is a much better solution for critical embedded applications such as automotive driving, medical, and banking, which are confined to a strict real-time performance constraint and a tight budget of battery lifetime and hardware cost.
Description
Keywords
Embedded systems, Fault, Neural Network Accelerators, Reliability
Citation