Soft error propagation in floating-point programs

Date
2010
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
As technology scales, VLSI performance has experienced an exponential growth. As feature sizes shrink, however, we will face new challenges such as soft errors (singleevent upsets) to maintain the reliability of circuits. Recent studies have tried to address soft errors with error detection and correction techniques such as error-correcting codes or redundant execution. However, these techniques come at a cost of additional storage or lower performance. We present a different approach to address soft errors. We start from building a quantitative understanding of the error propagation in software and propose a systematic evaluation of the impact of bit flip caused by soft errors on floating-point operations. Furthermore, we introduce a novel model to deal with soft errors. More specifically, we assume soft errors have occurred in memory and try to know how the errors will manifest in the results of programs. Therefore, some soft errors can be tolerated if the error in result is smaller than the intrinsic inaccuracy of floating-point representations or within a predefined range. We focus on analyzing error propagation for floating-point arithmetic operations. Our approach is motivated by interval analysis. We model the rounding effect of floating-point numbers, which enable us to simulate and predict the soft error propagation for a single floating-point arithmetic operation. In other words, we model and simulate the relation between the bit flip rate, which is determined by soft errors in hardware, and the error of floating-point arithmetic operations. And the simulation results enable us to tolerate certain types of soft errors without expensive error detection and correction processing.
Description
Keywords
Citation