INRIA / DYADE (Bull-INRIA Joint Venture for Advanced Research)
CADP (Construction and Analysis of Distributed Processes)
about 4000 lines of LOTOS (the whole architecture)
about 2000 lines of LOTOS (the cache coherency protocol)
about 3400 lines of LOTOS and 7000 lines of C (the RCC emulator)
POLYKID was an experimental CC-NUMA (Cache Coherent - Non Uniform
Memory Access) multiprocessor architecture developed by the Unix
Servers division of Bull in Pregnana (Italy). A working prototype
of a POLYKID machine running the AIX operating system was built in
1998 and found to perform correctly. The machine consisted of a
scalable collection of modules interconnected by a double ring
network implementing the lower layers of SCI (Scalable Coherent
Interface) standard. Each module consists of four nodes
connected by a data bus and an address bus. The data bus is implemented
as a crossbar (a 4x4 matrix of switches allowing every processor
to communicate directly with every other processor). As the main memory
is distributed among the modules, to achieve cache coherency, each
module in the architecture implements internally a MESI
(Modified, Exclusive, Shared, and Invalid)
cache coherency protocol, which performs a refined form of mutual
readers-writers exclusion algorithm.
The first part of the experiment concerned the formal specification and verification of the POLYKID architecture (and of the cache coherency protocol in particular). The formal descriptions in LOTOS have been produced while the POLYKID architecture was under design, and therefore they had to evolve regularly in order to keep track of the changes brought by the Bull designers. These LOTOS specifications, modelling several configurations of the architecture (two and three modules, two distinct memory addresses, distinct input and output queues, etc.) have been verified using the CADP toolset. This allowed to discover and correct about 20 problems in the protocol and its documentation (uncovered cases, undocumented features, deadlocks, data consistency violations).
The second part of the experiment dealt with the software emulation of hardware components. The POLYKID architecture contained an ASIC (Application-Specific Integrated Circuit) named RCC (Remote Cache Controller), specially designed by Bull to implement the cache coherency protocol. For testing purposes of the POLYKID prototype, since the RCC was not available at that time, the goal was to produce a software program that would emulate the behaviour of an RCC circuit. This program has been developed partly in LOTOS (the logic of the cache coherency protocol, the data types defining the states and transitions of the cache, and the behaviour part) and partly in C (the interface between the RCC emulator and the transponder circuit implementing the protocols for the system bus and the SCI network). Using the EXEC/CAESAR environment of CADP, the LOTOS code was automatically translated in C and interconnected with the simulation environment used by Bull. The RCC emulator obtained in this way has been used to perform various test suites (unit testing of the transponder, system test at the monitor level, then by booting the firmware, and finally the whole operating system).
The last part of the experiment consisted in generating tests to be applied on the real implementation of POLYKID. The challenge was to demonstrate that the TGV tool, originally developed for conformance testing of communication protocols, could also be used to generate tests for hardware architectures. A formal description of the system under test was derived from a LOTOS specification previously developed during the verification phase. Starting from an existing test plan containing 7 groups of test purposes developed by Bull, the groups 3 and 4 were considered, which dealt with the cache coherency protocol implemented in the RCC circuit. For each test purpose in these groups, the corresponding test cases were generated using TGV (a total of 75 tests, most of which had more than 400 states). Moreover, a tester package has been developed, which allows the automatic execution of tests (both in batch mode and in reactive mode) in the simulation environment used by Bull.
This case-study demonstrated the feasibility and benefits of using
formal methods for hardware design at system level. In particular,
it illustrates how the LOTOS language, originally intended to the
formal specification of communication protocols, together with tools
developed for verifying and testing communication protocols, can
enhance significantly the industrial approach commonly used to
system-level design. The experiment focused on the most complex part
of Bull's CC-NUMA POLYKID architecture, namely the cache coherency
protocol. From the industrial side, it was acknowledged that the
proposed approach was suitable for gaining a better understanding of
the system, clarifying issues, detecting design mistakes, improving the
overall quality of the product, and reducing the risk of delays.
H. Garavel, C. Viho, and M. Zendri.
"System Design of a CC-NUMA Multiprocessor Architecture Using
Formal Specification, Model-Checking, Co-Simulation, and Test
Generation". Springer International Journal on Software Tools for
Technology Transfer (STTT), Volume 3(3), pp. 314-331.
Short version available on-line as INRIA Research Report RR-4041, INRIA, November 2000, from: http://cadp.inria.fr/publications/Garavel-Viho-Zendri-00.html
INRIA Rhône-Alpes / VASY
ZIRST - 655, avenue de l'Europe
38334 Saint Ismier Cedex
Tel: +33 (0)4 76 61 52 24
Fax: +33 (0)4 76 61 52 52
|Further remarks:||This case study, amongst others, is described on the CADP Web site: http://cadp.inria.fr/case-studies|