Database of Case Studies Achieved Using CADP

Formal Analysis of Checkpointing Algorithms.

Organisation: Polytechnic University of Bucharest (ROMANIA)
INRIA Rhône-Alpes (FRANCE)

Method: LOTOS

Tools used: CADP (Construction and Analysis of Distributed Processes)

Domain: Distributed Algorithms.

Period: 2000-2001

Size: about 160,000 states and 17,000,000 transitions

Description: The functioning of a distributed system can be affected by several types of failures: software crashes, disk and hardware errors, and power losses. The first consequence of such a failure is a data loss, which can lead to an inconsistent state of the system. The state of a distributed system depends upon the states of all processes in the system and upon the communications between these processes. Therefore, the failure of a process could propagate to other processes and affect the global behaviour of the system. Given that safety-critical distributed systems must be able to function even in the presence of some failures, ensuring reliability of such systems is an important issue.

There are two main approaches used for recovery of a consistent state of the system after a failure: forward error recovery, when an erroneous state is further used, in an attempt to generate the correct state, using error compensation techniques; and backward error recovery, when some processes are rolled back to a previous error-free state, and then the computation is restarted. The latter approach is more general, because it allows to recover from an arbitrary failure. It is based upon the notion of recovery point, which allows saving and restoring the state of a process. One of the most used techniques for obtaining recovery points is the checkpointing, in which a set of local checkpoints (saved local states) is determined during normal computation, such as upon a failure occurrence, a rolled back computation can be resumed from this set.

Two checkpointing algorithms were formally specified in LOTOS and validated using the CADP toolbox: the Sync-and-Stop (SNS) algorithm and the Chandi-Lamport (CL) algorithm. The configurations considered consist of several processes communicating through an interconnection network that may have different topologies (star, ring, bi-directional ring, and full connectivity). The processes implement a finite distributed computation and can access a stable storage, on which the checkpoints and message logs are stored. The application behaviour and the checkpoint management are modelled using a controller process.

Several variants of each checkpointing algorithm have been studied, by varying the parameters of the system (number of processes, network topology, presence/absence of failures). The Labeled Transition System model for each configuration has been constructed using the CADP tools (for the largest configurations, containing 8 processes, compositional generation techniques were used). Several correctness properties were formulated and encoded in regular alternation-free mu-calculus: deadlock and livelock freedom, successful termination, message delivery, cut consistency, safety and liveness of checkpointing. All these properties have been successfully verified on each model using the EVALUATOR 3.0 model-checker of CADP.

Conclusions: Formal description techniques such as LOTOS provide a useful framework for studying distributed algorithms in general and checkpointing algorithms in particular. Developing and validating gradually complex specifications is possible due to the compositional analysis techniques offered by CADP.

Publications: [Godza-Cristea-Mateescu-01] Gavril Godza, Valentin Cristea, and Radu Mateescu. "Formal Specification of Checkpointing Algorithms". Proceedings of the 13th International Conference on Control Systems and Computer Science CSCS 13 (Bucharest, Romania), pp. 311-317, June 2001.
Available on-line from our FTP site in PDF or PostScript
prof. Valentin Cristea
Polytechnic University of Bucharest
Computer Science Department
313, Splaiul Independentei, Sector 6
77206 Bucharest
Tel: +40 1 410 0325
Fax: +40 1 411 2181

Further remarks: This case study, amongst others, is described on the CADP Web site:

Last modified: Fri Feb 19 09:12:04 2016.

Back to the CADP case studies page