CONVECS and POLARIS project-teams, Inria Grenoble - Rhône-Alpes and LIG
CADP (Construction and Analysis of Distributed Processes)
Internet of Things.
2500 lines of LNT
600 lines of MCL
Recent computing trends have been advocating for more distributed
paradigms, namely Fog computing, which extends the capacities of the
Cloud at the edge of the network, that is close to end devices and
end users in the physical world. The Fog is a key enabler of the
IoT (Internet of Things) applications, as it resolves some of
the needs that the Cloud fails to provide, such as low network
latencies, privacy, QoS, and geographical requirements. Recent
applications of the Fog include smart homes and cities, agriculture,
healthcare, transportation, etc. The Fog, however, is unstable because
it is constituted of billions of heterogeneous devices in a dynamic
ecosystem. IoT devices may regularly fail because of bulk production
and cheap design. Moreover, the Fog-IoT ecosystem is cyber-physical
and thus devices are subjected to external physical world conditions
that increase the occurrence of failures, resulting in application
inconsistencies, which in turn affect the physical world by inducing
hazardous and costly situations.
We propose an end-to-end autonomic failure management approach for IoT applications deployed in the Fog. The approach manages IoT applications and is composed of four functional steps: (i) state saving, (ii) monitoring, (iii) failure notification, and (iv) recovery. Each step is a collection of similar roles and is implemented, taking into account the specificities of the ecosystem (e.g., heterogeneity, resource limitations). State saving aims at saving data concerning the state of the managed application. These include runtime parameters and the data in the volatile memory, as well as messages exchanged and functions executed by the application. Monitoring aims at observing and reporting information on the lifecycle of the application. When a failure is detected, failure notifications are propagated to the part of the application which is affected by that failure. The propagation of failure notifications aims at limiting the impact of the failure and providing a partial service. To recover from a failure, the application is reconfigured and the data saved during the state saving step are used to restore a cyber-physical consistent state of the application. Cyber-physical consistency aims at maintaining a consistent behaviour of the application with respect to the physical world, as well as avoiding dangerous and costly circumstances.
The proposed distributed failure management approach was validated formally using model checking techniques. The various components of the protocol (simulator, global managers, local agents associated to software elements and appliances) were modeled in LNT, and 12 key correctness properties (final objective, architectural invariants, functional properties) were specified in MCL. These properties were checked using the EVALUATOR tool on a set of model instances generated automatically, corresponding to applications with different fog nodes, software elements, appliances, and bindings. The verification provided insight about several finer points of the recovery procedure (functioning in degraded mode after a failure and missing replacement appliance, propagation of failure notifications in applications with multiple dependencies, and handling of multiple failures in sequence). Finally, the approach was implemented as a framework called F3ARIoT, which was evaluated on a smart home application. The results showed the feasibility of deploying F3ARIoT on real Fog-IoT applications, as well as its good performances in regards to end user experience.
Failure management protocols for IoT applications in dynamic Fog
environments are complex distributed artifacts, which can benefit
from rigorous design approaches based on formal methods and
verification techniques. The verification of the proposed failure
management approach, with a focus on the recovery step, was useful
to assess its correctness and clarify several issues, and also helped
in its implementation.
Autonomic Resilience of Applications in a Largely Distributed
Cloud Environment. PhD Thesis, Université Grenoble Alpes,
Available on-line at: http://www.theses.fr/s190619
Inria - LIG / CONVECS
655, avenue de l'Europe
|Further remarks:||This case-study, amongst others, is described on the CADP Web site: http://cadp.inria.fr/case-studies|