In order to allow GRAS to send data over the network (or simply to
dupplicate it in SG), you have to describe the structure of data attached
with each message. This mecanism is stolen from NWS message passing
interface.
For each message, you have to declare a structure representing the
data to send as payload with the message.
Sending (or receiving) simple structures
Let's imagin you want to declare a STORE_STATE
message, which will send some data to the memory server for inclusion in
the database. Here is the structure we want to send:
struct state {
char id[STATE_NAME_SIZE];
int rec_size;
int rec_count;
double seq_no;
double time_out;
};
And here is the structure description GRAS needs to be able to send
this over the network:
const static DataDescriptor stateDescriptor[] =
{SIMPLE_MEMBER(CHAR_TYPE, STATE_NAME_SIZE, offsetof(struct state, id)),
SIMPLE_MEMBER(INT_TYPE, 1, offsetof(struct state, rec_size)),
SIMPLE_MEMBER(INT_TYPE, 1, offsetof(struct state, rec_count)),
SIMPLE_MEMBER(DOUBLE_TYPE, 1, offsetof(struct state, seq_no)),
SIMPLE_MEMBER(DOUBLE_TYPE, 1, offsetof(struct state, time_out))};
Contrary to what one could think when you first see it, it's pretty
easy. A structure descriptor is a list of descriptions, describing each
field of the structure. For example, for the first field, you say that
the base type is CHAR_TYPE, that there is
STATE_NAME_SIZE element of this type and that it's
position in the structure is computed by offsetof(struct state,
id). This leads to two remarks:
it's impossible to send dynamic sized strings that way. It's a
known limitation, but I think we can live with it.
Yes, the offsetof(struct state, id)
construction is C ANSI and is portable.
Sending (or receiving) complex structure
How to send non-flat structures, do you ask? It's not harder. Let's
imagin you want to send the following structure:
typedef struct {
unsigned long address;
unsigned long port;
} CliqueMember;
typedef struct {
char name[MAX_CLIQUE_NAME_SIZE];
double whenGenerated;
double instance;
char skill[MAX_SKILL_SIZE];
char options[MAX_OPTIONS_SIZE];
double period;
double timeOut;
CliqueMember members[MAX_MEMBERS];
unsigned int count;
unsigned int leader;
} Clique;
As you can see, this structure contains an array of another user
defined structure. To be able to send struct Clique,
you have to describe each structures that way:
static const DataDescriptor cliqueMemberDescriptor[] =
{SIMPLE_MEMBER(UNSIGNED_LONG_TYPE, 1, offsetof(CliqueMember, address)),
SIMPLE_MEMBER(UNSIGNED_LONG_TYPE, 1, offsetof(CliqueMember, port))};
static const DataDescriptor cliqueDescriptor[] =
{SIMPLE_MEMBER(CHAR_TYPE, MAX_CLIQUE_NAME_SIZE, offsetof(Clique, name)),
SIMPLE_MEMBER(DOUBLE_TYPE, 1, offsetof(Clique, whenGenerated)),
SIMPLE_MEMBER(DOUBLE_TYPE, 1, offsetof(Clique, instance)),
SIMPLE_MEMBER(CHAR_TYPE, MAX_SKILL_SIZE, offsetof(Clique, skill)),
SIMPLE_MEMBER(CHAR_TYPE, MAX_OPTIONS_SIZE, offsetof(Clique, options)),
SIMPLE_MEMBER(DOUBLE_TYPE, 1, offsetof(Clique, period)),
SIMPLE_MEMBER(DOUBLE_TYPE, 1, offsetof(Clique, timeOut)),
{STRUCT_TYPE, MAX_MEMBERS, offsetof(Clique, members),
(DataDescriptor *)&cliqueMemberDescriptor, cliqueMemberDescriptorLength,
PAD_BYTES(CliqueMember, port, unsigned long, 1)},
SIMPLE_MEMBER(UNSIGNED_INT_TYPE, 1, offsetof(Clique, count)),
SIMPLE_MEMBER(UNSIGNED_INT_TYPE, 1, offsetof(Clique, leader))};
So, even if less natural, it is possible to send structures
containing structures with these tools.
You can see that it's not only impossible to send dynamic-sized
strings, it impossible to send dynamic-sized arrays. Here,
MAX_MEMBERS is the maximum of members a clique can
contain. In NWS, this value is defined to 100. I'm not
sure, but I think that all the 100 values are sent each time, even if
there is only 3 non-null members. Yes, that's
bad.
The DataDescriptor_t MUST be const. Malloc'ing them and
then casting them on argument passing IS NOT OK. This is because we get
the number of elements in the array with the sizeof(dd)/sizeof(dd[0]).
Describing the data
DataDescriptor API
ErrLog
Handling sockets
Sockets API
comm_callbacks
comm_cb
Advanced ways to describe data (for experts)
Advanced Data description
Data description callbacks persistant state
config
Data description callbacks persistant state
Implementation of data description
dico
This module provide the quite usual dynamic array facility.
Dynamic array
dynar
This document introduce the GRAS library (Grid Reality
And Simulation, or according to my english dictionary,
Generally Recognized As Safe ;).
Overview
The purpose of the GRAS is to allow the developpement of
distributed programs which will work with as few as possible
modification both on the SimGrid simulator (SG), and in the Real Life
(RL).
Here are the problems when you want to do so:
Communication in SG is done by passing tasks, while in
RL, you have to deal with sockets (or any wrapper to it).
In RL, each process should provide a main()
function, and it's obviously not the case in SG.
Application class target
If you want to run your code both in RL and in SG, you won't be
able to use the full set of features offered by any of those two
worlds. GRAS tries to provide a suffisent set of features to develop
your application, and implement them in both worlds.
GRAS uses the paradigm of event-driven
programming, which is an extension to the message-passing
one. Any process of a typical event-driven application declares
callback to incoming events, which can be messages from other
processes, timers or others.
All messages have an header, specifying its type, and attached
data, represented as one or several C structures. In order to send
the data over the network in RL, a type-description mecanism is
provided, and the RL version of GRAS implements CDR
functionnalities. That is to say that the data are sent in the native
format of the sender host, and converted on the destination host only
if needed.
In order to not reimplement the wheel, GRAS use existing code,
and adapt them to make them work together. The SG version naturally
use the SimGrid toolkit, while the RL version is based over the
communication library used in NWS (note that this library was somehow
modified, since the previous version use XDR, ie both the sender and
the receiver convert the data from/to a so called network
format). That's why some basic knowledge about how NWS work is
supposed in this document. But don't worry, you only have to know the
basics about NWS, the internals needed to understand the document
will be presented when needed.
Overview of the GRAS library
Overview
gras
gras_private
Implementation of GRAS suited for real life.
RL
SimGrid was designed to ease the comparison of algorithms and
heuristics. That way, a lot of complicated notion from the system layer
were volontary left off. For example, migrating a process from an host to
another is as easy as: MSG_process_change_host(process, new_host).
No need to tell that performing this operation on real platform is really
harder. This simplification is a very good thing when you want to rapidly
prototype code, but makes things somehow more complicated in GRAS since
we want to have a realistic API, since it have to be implemented in
reality also.
The best example of complexity in GRAS_SG induced by simplicity in
SimGrid is the sockets handling. There is no "socket" in SG, but only
m_channel_t. In contrary to sockets from RL, no special treatment is
needed for a process before writing or reading on/from a channel. So, a
given channel can be pooled by more than one process. Likewise, you can
send data to a channel that nobody is actually listening to.
The SG implementation of GRAS repport as an error the fact that nobody is
listening to the socket when trying to open a socket, or send stuff using
a previously openned socket. That way, the SG version can be used to
debug all syncronization issues. For that, we store mainly the PID of
both the sender and the receiver in the socket structure, and then
resolve PID->process at the lastest moment. This search is a bit
expensive, but as long as there is no real garbage collection in SG, with
the information "dead process" within the structure, it's the only
solution to make sure that we won't dereference pointers to an old freed
structure when the process on the other side of the structure did finish
since the creation of the socket.
As said in the overview, the processes can declare to hear on several
sockets, but all incoming messages are handled by the same loop. So, we
can use only one channel per process, and use a table on each host to
determine to which process a message should be delivered depending on the
socket number provided by the sender.
RL, the implementation suited for real life.
Implementation of GRAS on top of the simulator.
SG
nws_comm
Sockets
@c:
@f:
@c:
@f:
@a1:
@c:
@f:
@a1:
@a2:
@c:
@f:
@a1:
@a2:
@a3:
@c:
@f:
@a1:
@a2:
@a3:
@a4:
@c:
@f:
@a1:
@a2:
@a3:
@a4:
@a5:
@c:
@f:
@c:
@f:
@a1:
@c:
@f:
@a1:
@a2:
@c:
@f:
@a1:
@a2:
@a3:
@c:
@f:
@a1:
@a2:
@a3:
@a4:
@c:
@f:
@a1:
@a2:
@a3:
@a4:
@a5:
@c:
@f:
@c:
@f:
@a1:
@c:
@f:
@a1:
@a2:
@c:
@f:
@a1:
@a2:
@a3:
@c:
@f:
@a1:
@a2:
@a3:
@a4:
@c:
@f:
@a1:
@a2:
@a3:
@a4:
@a5:
@c:
@f:
@c:
@f:
@a1:
@c:
@f:
@a1:
@a2:
@c:
@f:
@a1:
@a2:
@a3:
@c:
@f:
@a1:
@a2:
@a3:
@a4:
@c:
@f:
@a1:
@a2:
@a3:
@a4:
@a5:
@c:
@p:
@f:
@c:
@p:
@f:
@a1:
@c:
@p:
@f:
@a1:
@a2:
@c:
@p:
@f:
@a1:
@a2:
@a3:
@c:
@p:
@f:
@a1:
@a2:
@a3:
@a4:
@c:
@p:
@f:
@a1:
@a2:
@a3:
@a4:
@a5:
@c:
@p:
@f:
@a1:
@a2:
@a3:
@a4:
@a5:
@a6:
@f:
@f:
@a1:
@f:
@a1:
@a2:
@f:
@a1:
@a2:
@a3:
@f:
@a1:
@a2:
@a3:
@a4:
@f:
@a1:
@a2:
@a3:
@a4:
@a5:
@c:
@f:
@a1:
@a2:
@a3:
@a4:
@a5:
@a6:
@addr:
@Param2:
@sock:
@timeOut:
@Returns:
@sock:
@waitForPeer:
@Returns:
@destination:
@source:
@description:
@length:
@sourceFormat:
@pid:
@parentToChild:
@childToParent:
@Returns:
@sock:
@Returns:
@description:
@length:
@format:
@Returns:
@CHAR_TYPE:
@DOUBLE_TYPE:
@FLOAT_TYPE:
@INT_TYPE:
@LONG_TYPE:
@SHORT_TYPE:
@UNSIGNED_INT_TYPE:
@UNSIGNED_LONG_TYPE:
@UNSIGNED_SHORT_TYPE:
@STRUCT_TYPE:
@LAST_TYPE:
@whatType:
@Returns:
@Returns:
@whatType:
@Returns:
@Param1:
@Param2:
@ear:
@earPort:
@Returns:
@HOST_FORMAT:
@NETWORK_FORMAT:
@destination:
@source:
@whatType:
@repetitions:
@sourceFormat:
@whatType:
@repetitions:
@format:
@Returns:
@addr:
@Returns:
@addr:
@Returns:
@addr:
@Returns:
@addr:
@Returns:
@machineOrAddress:
@address:
@machineOrAddress:
@addressList:
@atMost:
@Returns:
@timeOut:
@sd:
@ldap:
@Returns:
@sd:
@Returns:
@sd:
@Returns:
@machineOrAddress:
@p:
@f:
@a1:
@a2:
@a3:
@a4:
@a5:
@a6:
@Returns:
@notifyFn:
@addr:
@Param2:
@sock:
@Returns:
@Param1:
@Param2:
@ear:
@earPort:
@Returns:
@structType:
@lastMember:
@memberType:
@repetitions:
@sock:
@child:
@Returns:
@sd:
@Returns:
@sd:
@Returns:
@sd:
@Returns:
@destination:
@source:
@whatType:
@repetitions:
@format:
@type:
@repetitions:
@type:
@repetitions:
@offset:
@sig:
@Param1:
@sd:
@Returns:
@sd:
@Returns:
@f:
@a1:
@a2:
@a3:
@a4:
@a5:
@a6:
@sd:
@msgType:
@vdata:
@sock:
@Returns:
@dd1:
@c1:
@dd2:
@c2:
@Returns:
@description:
@description:
@Returns:
@sd:
@data:
@description:
@description_length:
@repetition:
@Returns:
@sd:
@data:
@description:
@description_length:
@repetition:
@Returns:
@description:
@ft:
@Returns:
@no_error:
@malloc_error:
@mismatch_error:
@sanity_error:
@system_error:
@network_error:
@timeout_error:
@thread_error:
@unknown_error:
@Returns:
@sd:
@size:
@id:
@Returns:
@msg:
@timeOut:
@msgId:
@dataSize:
@seqCount:
@Returns:
@msgId:
@free_data_on_free:
@seqCount:
@Varargs:
@Returns:
@msg:
@timeout:
@Returns:
@sd:
@message:
@name:
@sequence_count:
@Varargs:
@Returns:
@sd:
@message:
@sequence_count:
@Varargs:
@Returns:
@sd:
@timeout:
@message:
@sequence_count:
@Varargs:
@Returns:
@Returns:
@host:
@Param2:
@sock:
@Returns:
@Param1:
@Param2:
@sock:
@Returns:
@sd:
@Returns:
@sd:
@Returns:
@sd:
@data:
@description:
@Returns:
@message:
@TTL:
@cb:
@sd:
@data:
@description:
@Returns:
@Returns:
@type:
@ud:
@d1:
@d2:
@Returns:
@dd1:
@c1:
@dd2:
@c2:
@dd:
@c:
@data:
@name:
@elm_type:
@size:
@code:
@name:
@element_type:
@fixed_size:
@dynamic_size:
@post:
@code:
@Returns:
@name:
@referenced_type:
@discriminant:
@post:
@code:
@Returns:
@name:
@discriminant:
@code:
@struct_code:
@field_name:
@field_type_code:
@struct_code:
@field_name:
@field_code:
@pre_cb:
@post_cb:
@Returns:
@struct_code:
@field_name:
@field_type_name:
@struct_code:
@field_name:
@field_type_name:
@pre_cb:
@post_cb:
@Returns:
@name:
@pre_cb:
@post_cb:
@code:
@Returns:
@union_code:
@field_name:
@field_type_code:
@union_code:
@field_name:
@field_code:
@pre_cb:
@post_cb:
@Returns:
@union_code:
@field_name:
@field_type_name:
@union_code:
@field_name:
@field_type_name:
@pre_cb:
@post_cb:
@Returns:
@name:
@field_count:
@post:
@code:
@Returns:
@name:
@desc:
@howmany:
@code:
@Returns:
@dst:
@name:
@Cdefinition:
@dst:
@Returns:
@code:
@def:
@type:
@code:
@type:
@Returns:
@name:
@type:
@Returns:
@name:
@element_type:
@fixed_size:
@dynamic_size:
@post:
@dst:
@Returns:
@name:
@desc:
@howmany:
@dst:
@Returns:
@name:
@default_value:
@free_func:
@size:
@alignment:
@post:
@dst:
@Returns:
@name:
@C_definition:
@dst:
@Returns:
@name:
@referenced_type:
@discriminant:
@post:
@dst:
@Returns:
@name:
@type:
@Returns:
@name:
@pre:
@post:
@dst:
@Returns:
@struct_type:
@name:
@field_type:
@pre:
@post:
@Returns:
@name:
@field_count:
@post:
@dst:
@Returns:
@union_type:
@name:
@field_type:
@pre:
@post:
@Returns:
@type:
@Returns:
@cursor:
@Returns:
@dynar:
@cursor:
@Returns:
@dynar:
@cursor:
@whereto:
@Returns:
@Returns:
@cat:
@parent:
@cat:
@thresholdPriority:
@sd:
@size:
@msg:
@msgId:
@free_data_on_free:
@seqCount:
@Varargs:
@Returns:
@msgId:
@name:
@sequence_count:
@Varargs:
@Returns:
@host:
@Param2:
@sock:
@Returns:
@sock:
@Returns:
@sd:
@Returns:
@sd:
@Returns:
@Param1:
@Param2:
@sock:
@Returns:
@Returns: