Soft Error

Introduction

A soft error in the context of this article, can be defined as an unintended change in electrical state of a device or circuit, that has an origin, external to the system’s designed inputs and outputs. A “soft” error is one which causes no direct permanent damage to the systems components, such that the unintended system behavior can be corrected with some form of “re-set”.

For real time systems, such as automated car navigation, biological assisting devices or commercial data centers, a soft error, whilst not permanently damaging the electronics, can have dangerous consequences if the error is not detected and corrected in real time. It is important, therefore, that the rate of these software “Failures In Time” (often referred to as the “FIT” rate), is fully characterized for critical systems. An unfortunate consequence resulting from device dimensional scaling to smaller geometries, is that all other things being equal, the soft error rate performance of any given circuit greatly deteriorates as devices shrink in size. Consequently, the most advanced process nodes, are those most at risk for such failures, which is why this topic is of increasing importance.

The “Failures In Time” or FIT rate is usually expressed as the number of device failures per one billion hours of operation. This may at first appear to be something that is very unlikely to ever happen, since one billion hours is over 100,000 years. However, a circuit need only consist of ten million devices, and an average FIT rate per device of unity, translates to the circuit suffering a soft error rate once every 100 hours of operation, or once every 4 days. For a number of applications, this error rate, if uncorrected, would be unacceptable. One error per year would be a more acceptable number.