DISTRIBUTOR manual page
Table of Contents

Name

distributor - state space generation using distributed reachability analysis

Synopsis

bcg_open [bcg_opt] spec[.bcg] [cc_opt] distributor [distributor_opt] configuration[.gcf] result[.pbg] [global_opt] [instance_opt]

or:

caesar.open [caesar_opt] spec[.lotos] [cc_opt] distributor [distributor_opt] configuration[.gcf] result[.pbg] [global_opt] [instance_opt]

or:

exp.open [exp_opt] spec[.exp] [cc_opt] distributor [distributor_opt] configuration[.gcf] result[.pbg] [global_opt] [instance_opt]

or:

fsp.open [fsp_opt] spec[.lts] [cc_opt] distributor [distributor_opt] configuration[.gcf] result[.pbg] [global_opt] [instance_opt]

or:

lnt.open [lnt_opt] spec[.lnt] [cc_opt] distributor [distributor_opt] configuration[.gcf] result[.pbg] [global_opt] [instance_opt]

or:

seq.open [seq_opt] spec[.seq] [cc_opt] distributor [distributor_opt] configuration[.gcf] result[.pbg] [global_opt] [instance_opt]

Description

This program performs exhaustive reachability analysis and generates the Labelled Transition System corresponding to the BCG graph spec.bcg, the LOTOS program spec.lotos, the composition expression spec.exp, the FSP program spec.lts, the LOTOS NT program spec.lnt, or the sequence file spec.seq.

Additionally, this program can generate a reduced Labelled Transition System by applying tau-compression or tau-confluence reductions on the fly.

Compared to generator and reductor , which are sequential programs executing on a single machine, distributor implements a distributed algorithm (derived from [GMS01]) that runs on several machines listed in the grid configuration file configuration.gcf (see below for a description of the GCF format). Each machine is used to generate and store a part of the Labelled Transition System. This allows distributor to exploit the computing resources (memory and processors) provided by many machines.

The current version of distributor does not handle LOTOS programs containing dynamic data types (such as lists, trees, etc.) implemented using pointers, i.e., all LOTOS programs such that the condition

   CAESAR_HASH_SIZE_STATE() < CAESAR_SIZE_STATE()
is verified, where the two functions CAESAR_HASH_SIZE_STATE() and CAESAR_SIZE_STATE() are defined in the caesar_graph application programming interface.

All the machines used by distributor must have the same processor and operating system; for instance, mixing little-endian and big-endian architectures is not allowed.

As regards the communications between the machines, distributor does not make strong assumptions and only requires standard TCP sockets to be available, together with at least one standard remote connection protocol (such as "rsh/rcp", "ssh/scp", "krsh"/"kcp", etc.). In particular, distributor does not require the existence of a common file system (e.g., NFS, Samba) shared between the machines.

The machine on which distributor is launched (using the command line described in the SYNOPSIS section above) is called the local machine. All other machines are called remote machines. Depending on the contents of configuration.gcf, distributor will launch distributed processes, which are called instances.

Typically, each instance executes on one remote machine, but there can be also several instances per remote machine as well as some instances executing on the local machine.

The generated Labelled Transition System will be stored as a partitioned BCG graph result.pbg (see below for a description of the PBG format). The resulting PBG file can later be turned into a BCG file using the bcg_merge tool.

Options

The options bcg_opt, if any, are passed to bcg_lib .

The options caesar_opt, if any, are passed to caesar and to caesar.adt .

The options exp_opt, if any, are passed to exp.open .

The options fsp_opt, if any, are passed to fsp.open .

The options lnt_opt, if any, are passed to lnt.open .

The options seq_opt, if any, are passed to seq.open .

The options cc_opt, if any, are passed to the C compiler.

The following options distributor_opt are currently available:

-taucompression
Perform tau-compression on the fly, which eliminates all cycles (i.e., strongly connected components) of "internal" transitions; such transitions are usually noted "tau" in the scientific literature and displayed as the character string "i" by the various BCG tools. This elimination is usually fast (linear in the size of the state space) and preserves branching bisimulation. Not a default option.

-tauconfluence
Perform tau-confluence on the fly, which is a partial order reduction preserving branching simulation. Tau-confluence subsumes tau-compression, thus leading to potentially stronger reductions, but it can be slower. Not a default option.

-monitor
Open a window for monitoring in real-time the distributed generation of the PBG graph, i.e., the progress status of each instance. The organization of the monitor window is described in section DISTRIBUTED MONITOR WINDOW below. Not a default option.

-display display
Connect to the X server display for opening the monitor window. This option is only useful in conjunction with the -monitor option. It overrides the current value, if any, of the $DISPLAY environment variable, which gives, by default, the X server for opening the monitor window. An erroneous value for display may cause distributor to abort. See X(7) for details. Not a default option.

-stat
Display various statistics during graph generation. Not a default option.

The options global_opt and instance_opt are described below, together with the GCF format.

Grid Configuration File (gcf) Format

The grid configuration file configuration.gcf specifies the list of instances to be launched by distributor, together with the various variables used for launching, connecting and parameterizing instances on the remote machines.

Each instance corresponds to a pair (machine, directory), meaning that distributor will launch a distributed process executing on machine and storing its files in directory located on some filesystem of machine. directory is called the "working directory" of the instance.

Instances may or may not be launched on the local machine, depending on the constraints on grid usage. For instance, clusters often distinguish between one frontal node (i.e., the local machine) used to submit jobs to the cluster, and many computing nodes (i.e., the remote machines) that perform distributed computations.

Each instance is given a unique number greater or equal to 1. Numbers are assigned in the same order as instances appear in the grid configuration file. Number 0 is reserved for the process corresponding to the execution of distributor on the local machine.

If the grid configuration file contains duplicated instances (i.e., same machine and same directory), only the last one is considered, the previous ones being entirely discarded.

There should be at least two different instances. Several instances may execute on the same machine, provided that their working directories are different. Also, a working directory may be either local to its machine or shared between several machines (using NFS, Samba, etc.).

The grid configuration file also specifies the value of (predefined) variables used to control the way instances are launched and communicate with each other.

The grid configuration file has the following syntax:


   <gcf>          ::= <global_opt> <instance_opt>*
   <global_opt>   ::= <directive>*
   <instance_opt> ::= <machine> <machine>* <directive>*
   <machine>      ::= <address-or-name> "\n"
   <directive>    ::= <variable> "=" <value> "\n"
   <variable>     ::= "buffer_size" | "cadp" | "connect_timeout"
                   |  "directory" | "files" | "hash" | "memory"
                   |  "port" |  "rcp" | "rsh" | "time" | "user"
   <value>        ::= <character-string>

where:

Any line starting with the "#" character is considered as a comment and ignored. Spaces and tabulations can be inserted before, between, or after terminal symbols.

Variables can be modified by directives as follows. First, some variables have a default value, which will be used unless overriden by some directive. A <directive> occurring in the <global_opt> list assigns its <variable> for all machines. A directive occurring in a <instance_opt> list assigns its <variable> only for the machines mentioned in this <instance_opt>, possibly overriding the value specified for this <variable> in the <global_opt> list.

The meaning of the different variables is the following:

buffer_size:
This variable specifies the size (in bytes) of the communication buffers used between instances (either for incoming data or outgoing data). This variable can only be set in the <global_opt> list, meaning that all communication buffers must have the same size. Maximal value is 2,147,483,648. Default value is 65536.

cadp:
This variable specifies the pathname of the directory in which the CADP toolbox is installed. This pathname must be an absolute one, i.e., it must start with a "/" character. It may contain spaces provided that the entire pathname is enclosed between double quotes.

The pathname may be the same for all machines (e.g., if the CADP toolbox is installed on a common filesystem shared between all machines) or may be different on each machine. In any case, the CADP toolbox should be properly installed on the local and remote machines. Default value is given by the $CADP environment variable.

connect_timeout:
This variable specifies the timeout (in seconds) allowed for establishing a connection to a remote machine. This variable can only be set in the <global_opt> list, meaning that each connection must succeed within the specified timeout. Default value is 20 seconds.

directory:
This variable specifies the pathname of the working directory in which an instance will place all the files needed for its execution or created during its execution (e.g., a copy of the distributor binary, a copy of the GCF file, a generated BCG file fragment, log files, etc.). If the specified directory does not exist, it will be created.

This pathname can be either an absolute one (starting with a "/" character) or a relative one, in which case it is interpreted with respect to the current directory on the local machine and to the user's home directory on remote machines. It may contain spaces provided that the entire pathname is enclosed between double quotes. It should not contain the tilde character "~". Default value is the user's home directory (which is not recommended unless when using a cluster of machines).

files:
This variable specifies a set of files to be copied to the remote machines (in the working directory) before launching the different instances. Setting this variable provides to instances executing on remote machines additional input files needed for their execution. This allows to take into account "hidden" dependences, as these input files may or may not be mentioned on the command line (see the SYNOPSIS section above). For instance, when running distributor using bcg_open , the spec.bcg file must be present on the remote machines. Similarly, when running distributor using exp.open , all the BCG files referenced in spec.exp must also be present on the remote machines.

This variable may specify either a single filename or a set of filenames separated by spaces. Filenames may contain the wildcard characters ("*", "?", "[", "]", ...) according to the Bourne shell conventions. Default value is empty.

hash:
This variable specifies the hash function to be used for partitioning states between machines (see [GMS01] for details). Its value is an integer n between 0 and 7 denoting the corresponding function CAESAR_STATE_n_HASH() of the caesar_hash library of OPEN/CAESAR. This variable can only be set in the <global_opt> list, meaning that all instances use the same hash function. Default value is 3.

memory:
This variable has currently no effect. It is reserved for future use.

port:
This variable specifies the TCP port to be used for communicating with the instance. Usually, it is not necessary to set this variable explicitly unless to comply with firewall restrictions. A TCP port is an integer value in the range 1...65,535. Using a TCP port lower than 1024 is not recommended and may require special privileges (such as being the super-user on Unix systems). By default, or in case of failure when opening the specified port, the port value is chosen automatically.

rcp:
This variable specifies the command used to copy files from the local machine to the working directories on the remote machines. This command must be present in one of the directories given by the $PATH environment variable. Its command line syntax should be the same as for the rcp(1) command, i.e.,
   rcp local_files username@hostname:directory

Default value is rcp. Other possible values are scp, kcp, or any similar variant of rcp(1). Remote machines should be configured in such a way that remote copy can be done automatically without asking for a password or a passphrase; this is usually done by setting an appropriate ".rhosts" or ".ssh/authorized_keys" file and/or launching an SSH agent before running distributor.

There is also a special value nfs, which indicates that all instances have the same working directory on a filesystem shared between all machines (e.g., using NFS), so that no remote copy is necessary. Setting variable rcp to the value nfs can only be done in the <global_opt> list.

rsh:
This variable specifies the command used to lauch instances on remote machines from the local machine. This command must be present in one of the directories given by the $PATH environment variable. Its command line syntax should be the same as for the rsh(1) command, i.e.,
   rsh -l username hostname "shell_command ..."

Default value is rsh. Other possible values are ssh, krsh, or any similar variant of rsh(1). Remote machines should be configured in such a way that remote authentication can be done automatically without asking for a password or a passphrase; this is usually done by setting an appropriate ".rhosts" or ".ssh/authorized_keys" file and/or launching an SSH agent before running distributor.

It is possible to supply options to the command assigned to the rsh variable (such as rsh=ssh -6) but certain options (e.g., ssh -v) should be avoided as they perturbate the normal functioning of distributor.

time:
This variable, if set, contains a Unix command name that will be inserted before the shell commands used to launch instances on remote machines. Possible values of this variable are "/usr/bin/time", "memtime", "valgrind", etc. It is therefore possible to monitor the time, memory, etc. used by remote instances on these machines. Default value is the empty string.

Alternatively, one can obtain the same effect by setting the (undocumented) $CADP_TIME environment variable; this is equivalent to setting in the ".gcf" file the "time" variable for all remote instances.

user:
This variable specifies the user login name (or account name) to be used for authentication when connecting to the remote machine(s). Default value is the name of the user running distributor on the local machine.

To provide for last-minute changes, the contents of the grid configuration file can be extended on the command line using the options global_opt and instance_opt. The grid configuration file can even be empty, in which case configuration.gcf should be replaced with "-" on the command line (in such case instance_opt should not be empty).

global_opt has the same syntax as the non-terminal <global_opt> in the above grammar (except that some characters meaningful to the shell must be escaped or quoted properly). If it is non-empty, it is interpreted exactly as if its contents were inserted in the grid configuration file, at the end of the <global_opt> list and before the first <instance_opt>.

instance_opt has the same syntax as a possibly empty list of non-terminals <instance_opt> (except that some characters meaningful to the shell must be escaped or quoted properly). If it is non-empty, it is interpreted exactly as if its contents were appended at the end of the grid configuration file, after the last <instance_opt>.

Thus, the value of variables can be set in five different ways, listed below by increasing precedence:

Examples of Grid Configuration Files

In its simplest form, a grid configuration file may simply list the remote machines to be used:

   machine1
   machine2
   machine3
   machine4

It is also possible to write a grid configuration file with several instances on the same machine:

   machine1
      directory=/tmp/instance1
   machine1
      directory=/tmp/instance2
   machine1
      directory=/tmp/instance3

This is a more complex example involving all the features described above:

   buffer_size = 32768
   cadp = /usr/local/cadp
   connect_timeout = 10
   directory = /home/vasy/distributor
   files = graph-*.bcg
   hash = 4
   port = 8016
   rcp = scp
   rsh = ssh
   user = inria
   machine1.domain.org
   machine2.domain.org
      user = vasy
   machine3.domain.org
      directory = /users/inria/distributor

Notice that the directive "user = vasy" applies to both machine1 and machine2.

Partitioned Bcg File (pbg) Format

The PBG format implements the theoretical concept of "Partitioned Labelled Transition System" introduced in [GMS01].

A PBG file gathers a collection of BCG files, called "fragments" (one fragment per instance), which are either stored on the local machine in case of a shared filesystem (e.g., NFS, etc.) or stored on remote machines in the working directories used by distributor.

Taken altogether, these fragments form a partition of a Labelled Transition System, the states and transitions of which are distributed among the various fragments as specified in [GMS01]. Note that, taken individually, each fragment is usually meaningless; in particular, it may be a disconnected graph, which is never the case of a state space generated from, e.g., a LOTOS program. These fragments can be recombined using the bcg_merge tool, which also performs various actions, such as state renumbering.

Concretely, a PBG file is a text file that contains references to a GCF file and to BCG fragments. The PBG format is yet undocumented but easily readable by humans. It may evolve in the future.

Distributed Monitor Window

The distributed monitor window started by the -monitor option is divided into five tabs:

It is worth keeping in mind that using the distributed monitor may slow down the state space generation.

Exit Status

Exit status is 0 if everything is alright, 1 otherwise.

Diagnostics

When the source is erroneous, error messages are issued. Additionally, distributor leaves ".log" files in the working directories on each machine; these files may help understanding the reason of a problem.

Authors

Version 1.0 of distributor was developed by Irina Smarandache-Sturm (INRIA/VASY) along the lines of [GMS01].

Version 2.0 of distributor was developed by Adrian Curic (INRIA/VASY) and Gilles Stragier (INRIA/VASY).

Version 3.0 of distributor implements a modified algorithm by Christophe Joubert (INRIA/VASY). It was rewritten from scratch by Nicolas Descoubes (INRIA/VASY). Damien Bergamini (INRIA/VASY) made a few adaptations. Hubert Garavel and Radu Mateescu fixed various issues and wrote the distributor manual page.

Operands

spec.bcg
BCG graph (input)

spec.exp
network of communicating LTSs (input)

spec.lotos
LOTOS specification (input)

spec.lts
FSP specification (input)

spec.lnt
LOTOS NT specification (input)

spec.seq
SEQ file (input)

configuration.gcf
grid configuration file (input)

result.pbg
partitioned BCG graph (output)

result-*.bcg
BCG fragments (outputs)

distributor-*.log
log files (outputs)

Files

The binary code of distributor is available in $CADP/bin.`arch`/distributor.a

The code for the distributed monitor window is available in $CADP/src/monitor/distributor.tcl

See Also

CAESAR Reference Manual, OPEN/CAESAR Reference Manual, bcg , bcg_open , bcg_merge , caesar , caesar.adt , caesar.open , exp.open , fsp.open , lnt.open , generator , seq.open

Additional information is available from the CADP Web page located at http://vasy.inria.fr

Directives for installation are given in file $CADP/INSTALLATION.

Recent changes and improvements to this software are reported and commented in file $CADP/HISTORY.

Bugs

Please report bugs to cadp@inria.fr

Bibliography

[GMS01] Hubert Garavel and Radu Mateescu and Irina Smarandache. Parallel State Space Construction for Model-Checking. In Matthew B. Dwyer, ed, Proceedings of the 8th International SPIN Workshop on Model Checking of Software SPIN'2001 (Toronto, Canada), LNCS 2057, pp. 217-234, May 2001. Revised version available as INRIA Research Report RR-4341, December 2001.


Table of Contents