ABSTRACT

Exponential advances in process technologies and circuit capacities led to the current success and future promise of high-performance computing enabled by multicore principles. However, advances in integrated circuits present several key challenges to system reliability. Transient and permanent errors such as electromigration, negative bias temperature instability (NBTI), time dependent dielectric breakdown (TDDB), and thermal cycling (TC), are expected to increase [14,29]. To be successful, computing systems must be able to continue functioning in spite of these soft errors, necessitating the development of new methods for self-healing circuits that can detect and recover from these errors.