Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

A backward/forward recovery approach for the preconditioned conjugate gradient method

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      University of Manchester Manchester; Department of Mathematical and Statistical Sciences; University of Colorado Denver; École normale supérieure - Lyon (ENS Lyon); Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA); Inria Grenoble - Rhône-Alpes; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP); École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL); Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL); Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS); Laboratoire de l'Informatique du Parallélisme (LIP); Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS); Innovative Computing Laboratory Knoxville (ICL); The University of Tennessee Knoxville; ANR-13-MONU-0007,SOLHAR,Solveurs pour architectures hétérogènes utilisant des supports d'exécution(2013); ANR-10-BLAN-0301,RESCUE,Résilience des applications scientifiques sur machines exascales(2010)
    • بيانات النشر:
      HAL CCSD
      Elsevier
    • الموضوع:
      2016
    • Collection:
      Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
    • نبذة مختصرة :
      International audience ; Several recent papers have introduced a periodic verification mechanism to detect silent errors in iterative solvers. Chen [PPoPP'13, pp. 167–176] has shown how to combine such a verification mechanism (a stability test checking the orthogonality of two vectors and recomputing the residual) with checkpointing: the idea is to verify every d iterations, and to checkpoint every c × d iterations. When a silent error is detected by the verification mechanism, one can rollback to and re-execute from the last checkpoint. In this paper, we also propose to combine checkpointing and verification, but we use algorithm-based fault tolerance (ABFT) rather than stability tests. ABFT can be used for error detection, but also for error detection and correction, allowing a forward recovery (and no rollback nor re-execution) when a single error is detected. We introduce an abstract performance model to compute the performance of all schemes, and we instantiate it using the preconditioned conjugate gradient algorithm. Finally, we validate our new approach through a set of simulations.
    • Relation:
      hal-01354682; https://hal.inria.fr/hal-01354682; https://hal.inria.fr/hal-01354682/document; https://hal.inria.fr/hal-01354682/file/jocs_488_ucar.pdf
    • الرقم المعرف:
      10.1016/j.jocs.2016.04.008
    • Rights:
      info:eu-repo/semantics/OpenAccess
    • الرقم المعرف:
      edsbas.340174E1