chapter  6
34 Pages

Fault-tolerance and availability awareness in computational grids

ByXavier Besseron, Mohamed-Slim Bouguerra, Thierry Gautier, Erik Saule

Xavier Besseron INPG, 51 avenue Jean Kuntzmann, 38330 Montbonnot Saint Martin, 38041 Grenoble, Cedex 9 France

Mohamed-Slim Bouguerra INRIA, 51 avenue Jean Kuntzmann, 38330 Montbonnot Saint Martin, 38041 Grenoble, Cedex 9 France

Thierry Gautier INRIA, 51 avenue Jean Kuntzmann, 38330 Montbonnot Saint Martin, 38041 Grenoble, Cedex 9 France

Erik Saule BioMedical Informatics, Ohio State University, 3190 Graves Hall, 333W 10th Avenue, Columbus OH 43210, USA

Denis Trystram INPG, 51 avenue Jean Kuntzmann, 38330 Montbonnot Saint Martin, 38041 Grenoble, Cedex 9 France

Machines we are using everyday are not perfect; they are often subject to dysfunctions. Such dysfunctions can have different sources such as processor’s wear-out, mechanics part breaks in a hard drive, defective blocks in memory,

a bit of chapter is to investigate efficient solutions for guarantying the reliable execution of applications in computational grid platforms. We will put a special emphasis on the checkpointing techniques.