Version du 12/11/2007

Getting started

Using CRAC is not more complicated than MPI since it is based on message passing. Nevertheless, the biggest mistake that can be done is to code an application in an "MPI state of mind". Indeed, the communication semantic is very different and several mechanisms characteristic of iterative and especially AIAC executions are included in CRAC but not in MPI (message crushing, convergence detection, ...). Thus, it is a good idea to carefully read the AIAC vs. SIAC and CRAC's core sections.


After that, proceed with the section Coding with CRAC and Executing a CRAC application.

Coding with CRAC

This section is divided in three parts: the Tag notion, the API description and Compiling user's tasks. The two firsts should be read carefully before writing the least line of code.

About tags

At the end of each iteration, a computing task needs to send local data to several tasks. These data are very often elements of a vector or a matrix, needed to update local data before going on with the next iteration. The rank of these elements is determined by the problem to solve, the number of tasks and the data distribution over the tasks.

In most cases, a task X always sends the same set of elements to the task Y at the end of each iteration. Obviously, if the rank of these element is the same, their value may change from iteration to iteration.

In AIAC execution, message crushing accelerates the convergence. CRAC can do this transparently, but for this, must be able to determine if two messages coming from the same task contain the same set of elements, that is will be used to update the same local data. This is done by explicitly tagging the messages.


This gives the two first rules of the CRAC coder:

All along the code, the developer must use the same tag for all messages containing the same set of element sent by A and received by B.

If two messages sent by A and received by B do not contain the same set of elements, they must not use the same tag.

Do not follow these rules and your code won't be really AIAC or even may crash.


back to top.
API

Even if CRAC is dedicated to AIAC executions, it also provides methods particularly useful for any distributed computing code. The table below summarizes all the methods a task can call. Just click on the method name to access to a detailed description of it.


Type Prototype
Informations int CRACGetTaskId()
int CRACGetTaskNb()
char** CRACGetArgValue(int taskId)
int CRACGetArgNb(int taskId)
int CRACGetIdInVDM()
int CRACGetIdInNode()
double CRACGetLocalTime()
Communications void CRACSend(int dest, int tag, unsigned char *data, int size)
int CRACRecv(int src, int tag, unsigned char *data, int size, int mode)
bool CRACConvergence(bool set, int mode, int op)
void CRACBroadcast(int src, unsigned char* data, int size)
void CRACBarrier()
Others void CRACFinalize()
void CRACSetEnv(int what, int value)

int CRACGetTaskId()

This method returns the current task identifier. This number ranges from 0 to nbTask-1.


N.B.: the task identifier is determined automatically by CRAC, using the XML file that describes the execution context.


back to table.
int CRACGetTaskNb()

This method returns the number of tasks involved in the current execution.


back to table.
char** CRACGetArgValue(int taskId)

This method returns the argument array for the task taskId


N.B.: the array is given by the user in the XML file that describes the execution context.


back to table.
int CRACGetArgNb(int taskId)

This method returns the number of argument passed to the task taskId


back to table.
int CRACGetIdInVDM()

CAUTION: this method should never be used in "normal" code. It is only provided for debugging purposes


This method returns the daemon identifier that hosts the current task. This number ranges from 0 to nbDaemon-1.


N.B.: the daemon identifier is determined automatically by CRAC, using the XML file that describes the VDM.


back to table.
int CRACGetIdInNode()

CAUTION: this method should never be used in "normal" code. It is only provided for debugging purposes


A node represents the computing entity hosted by the daemon for an execution. It may contains several tasks. This method returns the rank of the current task in the node.


N.B.: the rank is determined automatically by CRAC, using the XML file that describes the execution context.


back to table.
double CRACGetLocalTime()

This method returns an absolute time as a double. In order to measure a time gap, two calls must be done, as shown below.

double gap = CRACGetLocalTime();
...
...
gap = CRACGetLocalTime() - gap;
cout << "elapsed time : " << gap << "s" << endl;
  
back to table.
void CRACSend(int dest, int tag, unsigned char *data, int size)

This method sends a message of size size to task dest, containing bytes given in data and marked with the tag tag. The first byte sent is byte 0 of data.


N.B.: this method never blocks and returns nearly immediately. It simply copies the data in the sending queues used by the Sender thread.


back to table.
int CRACRecv(int src, int tag, unsigned char *data, int size, int mode)

This method tries to receive a message of size size from task src, marked with the tag tag, storing bytes in data. The first byte received is stored at index 0 of data.


mode indicates if the call is blocking or not. Two values are possible :

N.B.: obviously, this call must be chosen blocking for SIAC executions and not blocking for AIAC.


back to table.
bool CRACConvergence(bool set, int mode, int op)

This method manages all operations bounded to local/global convergence. There are two types of operation, depending on the value of op:

In case of op = CRAC_RESET_CONV, mode indicates if the call is blocking or not. Two values are possible :

N.B.: obviously, this call must be chosen blocking for SIAC executions and not blocking for AIAC.


back to table.
void CRACBroadcast(int src, unsigned char *data, int size)

The task src broadcasts to all other tasks a message of size size, containing bytes given in data. The first byte sent is byte 0 of data. All other tasks store the message in data.


N.B. 1: this method use a binary tree over all tasks to achieve the broadcast in a minimal communication steps. Thus, a task may receive the message from a task != src.


N.B. 2: this method is blocking until the broadcasted message is received. Internally, an acknowledgment is sent to the sender immediately after the reception.


N.B. 3: This method (like CRACBarrier()) must be called by all tasks. If it is not the case, some tasks may be blocked because an acknowledgment is missing.


back to table.
void CRACBarrier()

This method allows to insert a synchronization point of all tasks during the execution.


N.B. 1: this method uses a decentralized algorithm that implies only one communication with each neighbors. Thus, it is minimal in communication steps and bandwidth use.


N.B. 2: This method (like CRACBroadcast()) must be called by all tasks. If it is not the case, some tasks may be blocked because an internal acknowledgment is missing.


back to table.
void CRACFinalize()

This method allows to end correctly an execution. In fact, it is a special barrier.


back to table.
void CRACSetEnv(int what, int value)

This method allows to set some internals parameters of CRAC. It should never be used, except with what = CRAC_ITER_CONV. The second parameter is an integer value. It is only used for AIAC executions. It fixes the number of iterations the supermaster waits for when the global convergence has been reached. Passed this number, it warns all tasks. Waiting for few iterations allows to possibly receive a message saying a task has no more a convergence state set to true.


back to table.

Compiling user's task

A user's task is just a class, inheriting from the Task class. Its run() method is the execution entry point. Obviously, it is not mandatory to put all the code in this method.


Compiling a user's task must produce a shared library (.so) that will be loaded dynamically in the daemon context in order to instantiate UserTask objects.


The most simple is to modify the Makefile in the examples/CRAC directory.


back to top.
Executing a CRAC application

Before we can execute any CRAC application, some constraints on ssh and searching path must be checked in order to launch correctly the daemon and the tasks. After that, you can create a user's task (as those of the examples directory), a VDM configuration file and an Execution configuration file.


Then, you can use cracboot and cracrun to create the VDM and launch your tasks.

SSH configuration

Firstly, you must be sure that ssh client and server is installed on each machine that will be used in the VDM. If it is not the case, it won't be possible to spawn the daemons


Secondly, you must configure ssh in such a way it never asks you for a password or passphrase. It is done by generating an asymmetric key (see ssh-keygen) on each machine in the VDM and put the public part in the file ~/.ssh/authorized_keys. This file must contains all the public keys generated and be present on each machine. Sometimes, you have an unique NFS account for a set of machines. Then, you only need to generate a single key for the set since your ~/.ssh directory is shared by all the machines.

Searching path configuration

There are four binaries essential for CRAC : cracd (the daemon), cracboot (the daemon spawner), cracwipe (the daemon wiper), and cracrun (the task spawner).


These binaries must be in the searching path of each machine. Supposing you build CRAC and the binaries are in /home/bob/CRAC-project/bin, then you must add in your shell initialization script (for example ~/.bashrc) :

export PATH=$PATH:/home/bob/CRAC-project/bin
VDM configuration file

This is an XML file that contains all needed informations on the machines used in the VDM :


The examples below are quite simple but the give an overview of the possibilities to define the VDM. If you want a more complex example, see archi_3site.xml which describes the architecture given in the CRAC's core section.


3 machines, single site, no cluster of private IPs.
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE architecture SYSTEM "sitevalide.dtd">
<architecture> <supermaster id="SM" tcpport="21345" /> <site id="site_1"> <master id="SM" /> <machine id="SM"> <address type="public"> <port>14568</port> <ip>mach1.site1.fr</ip> </address> </machine> <machine id="site1slave1"> <address type="public"> <port>14568</port> <ip>mach10.site1.fr</ip> </address> </machine> <machine id="site1slave2"> <address type="public"> <port>14568</port> <ip>mach14.site1.fr</ip> </address> </machine> </site> </architecture>
Important points :
  • the two first lines must never be omitted or changed unless you also change the name of the .dtd file.
  • the file sitevalide.dtd must exist in the same directory that these XML files. It is used to parse and check the correctness of the syntax.
  • there must be only one supermaster defined.
  • there must be at least one site, for which the master identifier is the supermaster identifier
  • there are four machine types, which must be set by the user:
    • public: machine can send/receive packets from outside its site.
    • private: machine can send/receive packets only to/from within its site.
    • noin: like a private machine but that can send packets outside its site.
    • noout: like a private machine but that can receive packets from outside its site.
  • the ip tag may contain a canonical name or an IP.
  • if the machine is private and a canonical name is given, the frontal will do the conversion to an IP. Thus, it is not mandatory that the name of private machine are on a DNS.
5 machines, 2 sites, 1 cluster of 2 machines
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE architecture SYSTEM "sitevalide.dtd">
<architecture> <supermaster id="SM" tcpport="21345" /> <site id="site_1"> <master id="SM" /> <machine id="SM"> <address type="public"> <port>14568</port> <ip>mach1.site1.fr</ip> </address> </machine> <machine id="site1slave1"> <address type="public"> <port>14568</port> <ip>mach10.site1.fr</ip> </address> </machine> </site> <site id="site_2"> <master id="site2master" /> <machine id="site2master"> <address type="public"> <port>16789</port> <ip>mach1.site3.fr</ip> </address> </machine> <machine id="site2slave1"> <address type="private"> <port>21567</port> <ip>priv1.site3.fr</ip> </address> </machine> <machine id="site2slave2"> <address type="private"> <port>21567</port> <ip>192.168.0.11</ip> </address> </machine> <cluster id="cluster_site2"> <frontal id="site2master" /> <list ids="site2slave1 site2slave2" /> </cluster> </site> </architecture>

back to top.
Execution configuration file

Launching a CRAC application consists in creating dynamically instances of user's tasks on chosen machines. For this, CRAC defines computing entities called node, that associate a user's task code to a machine in the VDM and a number of instances. You can also pass arguments to the task. These associations are defined in a XML file that contains:

Execution description of ping example (2 machines, 2 nodes, 2 tasks).
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE architecture SYSTEM "runvalide.dtd">
<running> <node machine="cluster1" number="1"> <class>/home/bob/CRAC-project/examples/CRAC/pingpong.so</class> <args>"1000" "10000" "1000"</args> </node> <node machine="cluster2" number="1"> <class>/home/bob/CRAC-project/examples/CRAC/pingpong.so</class> <args>"1000" "10000" "1000"</args> </node> </running>
Execution description of broadcast example (4 machines, 5 nodes, 8 tasks).
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE architecture SYSTEM "runvalide.dtd">
<running> <node machine="cluster1" number="1"> <class>/home/bob/CRAC-project/examples/CRAC/broad.so</class> <args></args> </node> <node machine="cluster2" number="1"> <class>/home/bob/CRAC-project/examples/CRAC/broad.so</class> <args></args> </node> <node machine="cluster1" number="2"> <class>/home/bob/CRAC-project/examples/CRAC/broad.so</class> <args></args> </node> <node machine="cluster5" number="1"> <class>/home/bob/CRAC-project/examples/CRAC/broad.so</class> <args></args> </node> <node machine="cluster4" number="3"> <class>/home/bob/CRAC-project/examples/CRAC/broad.so</class> <args></args> </node> </running>
Important points :

back to top.
Launching all

Once you have configuration files (supposing archi_mytest.xml and run_mytest.xml), just type :

> cracboot arch_mytest.xml
  ... few lines for initialization ending with "Waiting for running configuration"
> cracrun run_mytest.xml

back to top.