ABSTRACT

This chapter will start with a brief review of nanoelectronic challenges while focusing on the reliability challenge. One of the most recent call-to-arms [1] raises two fundamental questions: “what is the meaning of reliability for systems that use nano or new technologies, and how do we interpret this meaning in practice?” The chapter has a hands-on approach for implementing computations (logic). The reader will be walked through a series of simple CMOS-based examples starting from the device (e.g., transistor) level and moving up to the gate, the circuit, the block, and only pointing towards the system level. Reliable memory design will not be dealt with in this chapter, as viable solutions (based on detection and errors correcting codes, stand-by spare rows and columns, reconfiguration, etc.) have long been used in industry, and are well known (the only question being how much will these be needed). The choice of CMOS for most of the examples is due to the broad design base available, but the ideas presented here can be translated to other nanotechnologies (as a few single electron technology set examples will show). The design approach will constantly be geared towards enhancing reliability as much as possible at all the levels. Unexpectedly, the final solution will not be a very power hungry one, but on the contrary, it will be quite low power for one that incorporates so much redundancy (at all the levels). Possible explanations can be found in some 12-2neural computations (and communications) articles, suggesting that unreliable “devices” (i.e., neurons and synapses) can lead to reliable computations (and communications) by cleverly combining redundancy and encoding, while simultaneously minimizing energy. The main conclusions of this chapter are that:

Reliable designs should not be weighted with respect to their redundancy factors (as commonly done in the literature), but with respect to power, energy, and/or area (as it is customary in the VLSI community).

Reliable designs for implementing computations using spatial redundancy are possible, and can even be low power. [Remark: For memory and communications, error detection and correction codes—maybe in combination with novel data encoding techniques—will most certainly prevail.]

Defects and faults manifest themselves in different ways, one of them being by increasing currents (i.e., also power and heat). Hence adaptive local detection at reasonably low-levels could be designed based on current sensors (built-in IDD testing), which could automatically trigger reconfiguration at the higher levels (when a predefined threshold current is exceeded), leading to self-healing systems.

A lot of effort has to be invested for the development of EDA tools for quickly and precisely estimating the overall reliability at the system level, and for their integration with the EDA tools currently used for estimating area, delay and power. This aspect was also very strongly vouched for at the IEEE/ACM International Conference on Computer-Aided Design [2] where chip designers spoke of their difficulties in coping with reliability, power, clocking, statistical timing, verification and analog/mixed signal design, mentioning that “there are a number of major holes in the IC design flow, and more research and development are urgently needed to fill them” and that “design-for-reliability needs help”