Database of Case Studies Achieved Using CADP

Autonomic Resilience of Distributed IoT Applications in the Fog

Organisation: Orange Labs
CONVECS and POLARIS project-teams, Inria Grenoble - Rhône-Alpes and LIG

Method: LNT

Tools used: CADP (Construction and Analysis of Distributed Processes)

Domain: Internet of Things.

Period: 2019

Size: 2500 lines of LNT
600 lines of MCL

Description: Recent computing trends have been advocating for more distributed paradigms, namely Fog computing, which extends the capacities of the Cloud at the edge of the network, that is close to end devices and end users in the physical world. The Fog is a key enabler of the IoT (Internet of Things) applications, as it resolves some of the needs that the Cloud fails to provide, such as low network latencies, privacy, QoS, and geographical requirements. Recent applications of the Fog include smart homes and cities, agriculture, healthcare, transportation, etc. The Fog, however, is unstable because it is constituted of billions of heterogeneous devices in a dynamic ecosystem. IoT devices may regularly fail because of bulk production and cheap design. Moreover, the Fog-IoT ecosystem is cyber-physical and thus devices are subjected to external physical world conditions that increase the occurrence of failures, resulting in application inconsistencies, which in turn affect the physical world by inducing hazardous and costly situations.

We propose an end-to-end autonomic failure management approach for IoT applications deployed in the Fog. The approach manages IoT applications and is composed of four functional steps: (i) state saving, (ii) monitoring, (iii) failure notification, and (iv) recovery. Each step is a collection of similar roles and is implemented, taking into account the specificities of the ecosystem (e.g., heterogeneity, resource limitations). State saving aims at saving data concerning the state of the managed application. These include runtime parameters and the data in the volatile memory, as well as messages exchanged and functions executed by the application. Monitoring aims at observing and reporting information on the lifecycle of the application. When a failure is detected, failure notifications are propagated to the part of the application which is affected by that failure. The propagation of failure notifications aims at limiting the impact of the failure and providing a partial service. To recover from a failure, the application is reconfigured and the data saved during the state saving step are used to restore a cyber-physical consistent state of the application. Cyber-physical consistency aims at maintaining a consistent behaviour of the application with respect to the physical world, as well as avoiding dangerous and costly circumstances.

The proposed distributed failure management approach was validated formally using model checking techniques. The various components of the protocol (simulator, global managers, local agents associated to software elements and appliances) were modeled in LNT, and 12 key correctness properties (final objective, architectural invariants, functional properties) were specified in MCL. These properties were checked using the EVALUATOR tool on a set of model instances generated automatically, corresponding to applications with different fog nodes, software elements, appliances, and bindings. The verification provided insight about several finer points of the recovery procedure (functioning in degraded mode after a failure and missing replacement appliance, propagation of failure notifications in applications with multiple dependencies, and handling of multiple failures in sequence). Finally, the approach was implemented as a framework called F3ARIoT, which was evaluated on a smart home application. The results showed the feasibility of deploying F3ARIoT on real Fog-IoT applications, as well as its good performances in regards to end user experience.

Conclusions: Failure management protocols for IoT applications in dynamic Fog environments are complex distributed artifacts, which can benefit from rigorous design approaches based on formal methods and verification techniques. The verification of the proposed failure management approach, with a focus on the recovery step, was useful to assess its correctness and clarify several issues, and also helped in its implementation.

Publications: [Ozeer-19] Umar Ozeer. "Autonomic Resilience of Applications in a Largely Distributed Cloud Environment". PhD Thesis, Université Grenoble Alpes, December 2019.
Available on-line at:
or from the CADP Web site in PDF or PostScript

[Ozeer-Salaun-Letondeur-et-al-20] Umar Ozeer, Gwen Salaün, Loïc Letondeur, François-Gaël Ottogalli, and Jean-Marc Vincent. "Verification of a Failure Management Protocol for Stateful IoT Applications". Proceedings of the 25th International Conference on Formal Methods for Industrial Critical Systems FMICS'2020 (Vienna, Austria), Lecture Notes in Computer Science vol. 12327, pp. 272-287. Springer Verlag, September 2020.
Available on-line at:
or from the CADP Web site in PDF or PostScript

Gwen Salaün
655, avenue de l'Europe
CS 90051
38334 Montbonnot

Further remarks: This case-study, amongst others, is described on the CADP Web site:

Last modified: Thu Feb 11 12:22:03 2021.

Back to the CADP case studies page