In order to allow GRAS to send data over the network (or simply to dupplicate it in SG), you have to describe the structure of data attached with each message. This mecanism is stolen from NWS message passing interface. For each message, you have to declare a structure representing the data to send as payload with the message. Sending (or receiving) simple structures Let's imagin you want to declare a STORE_STATE message, which will send some data to the memory server for inclusion in the database. Here is the structure we want to send: struct state { char id[STATE_NAME_SIZE]; int rec_size; int rec_count; double seq_no; double time_out; }; And here is the structure description GRAS needs to be able to send this over the network: const static DataDescriptor stateDescriptor[] = {SIMPLE_MEMBER(CHAR_TYPE, STATE_NAME_SIZE, offsetof(struct state, id)), SIMPLE_MEMBER(INT_TYPE, 1, offsetof(struct state, rec_size)), SIMPLE_MEMBER(INT_TYPE, 1, offsetof(struct state, rec_count)), SIMPLE_MEMBER(DOUBLE_TYPE, 1, offsetof(struct state, seq_no)), SIMPLE_MEMBER(DOUBLE_TYPE, 1, offsetof(struct state, time_out))}; Contrary to what one could think when you first see it, it's pretty easy. A structure descriptor is a list of descriptions, describing each field of the structure. For example, for the first field, you say that the base type is CHAR_TYPE, that there is STATE_NAME_SIZE element of this type and that it's position in the structure is computed by offsetof(struct state, id). This leads to two remarks: it's impossible to send dynamic sized strings that way. It's a known limitation, but I think we can live with it. Yes, the offsetof(struct state, id) construction is C ANSI and is portable. Sending (or receiving) complex structure How to send non-flat structures, do you ask? It's not harder. Let's imagin you want to send the following structure: typedef struct { unsigned long address; unsigned long port; } CliqueMember; typedef struct { char name[MAX_CLIQUE_NAME_SIZE]; double whenGenerated; double instance; char skill[MAX_SKILL_SIZE]; char options[MAX_OPTIONS_SIZE]; double period; double timeOut; CliqueMember members[MAX_MEMBERS]; unsigned int count; unsigned int leader; } Clique; As you can see, this structure contains an array of another user defined structure. To be able to send struct Clique, you have to describe each structures that way: static const DataDescriptor cliqueMemberDescriptor[] = {SIMPLE_MEMBER(UNSIGNED_LONG_TYPE, 1, offsetof(CliqueMember, address)), SIMPLE_MEMBER(UNSIGNED_LONG_TYPE, 1, offsetof(CliqueMember, port))}; static const DataDescriptor cliqueDescriptor[] = {SIMPLE_MEMBER(CHAR_TYPE, MAX_CLIQUE_NAME_SIZE, offsetof(Clique, name)), SIMPLE_MEMBER(DOUBLE_TYPE, 1, offsetof(Clique, whenGenerated)), SIMPLE_MEMBER(DOUBLE_TYPE, 1, offsetof(Clique, instance)), SIMPLE_MEMBER(CHAR_TYPE, MAX_SKILL_SIZE, offsetof(Clique, skill)), SIMPLE_MEMBER(CHAR_TYPE, MAX_OPTIONS_SIZE, offsetof(Clique, options)), SIMPLE_MEMBER(DOUBLE_TYPE, 1, offsetof(Clique, period)), SIMPLE_MEMBER(DOUBLE_TYPE, 1, offsetof(Clique, timeOut)), {STRUCT_TYPE, MAX_MEMBERS, offsetof(Clique, members), (DataDescriptor *)&cliqueMemberDescriptor, cliqueMemberDescriptorLength, PAD_BYTES(CliqueMember, port, unsigned long, 1)}, SIMPLE_MEMBER(UNSIGNED_INT_TYPE, 1, offsetof(Clique, count)), SIMPLE_MEMBER(UNSIGNED_INT_TYPE, 1, offsetof(Clique, leader))}; So, even if less natural, it is possible to send structures containing structures with these tools. You can see that it's not only impossible to send dynamic-sized strings, it impossible to send dynamic-sized arrays. Here, MAX_MEMBERS is the maximum of members a clique can contain. In NWS, this value is defined to 100. I'm not sure, but I think that all the 100 values are sent each time, even if there is only 3 non-null members. Yes, that's bad. The DataDescriptor_t MUST be const. Malloc'ing them and then casting them on argument passing IS NOT OK. This is because we get the number of elements in the array with the sizeof(dd)/sizeof(dd[0]). Describing the data DataDescriptor API ErrLog Handling sockets Sockets API comm_callbacks comm_cb Advanced ways to describe data (for experts) Advanced Data description Data description callbacks persistant state config Data description callbacks persistant state Implementation of data description dico This module provide the quite usual dynamic array facility. Dynamic array dynar This document introduce the GRAS library (Grid Reality And Simulation, or according to my english dictionary, Generally Recognized As Safe ;). Overview The purpose of the GRAS is to allow the developpement of distributed programs which will work with as few as possible modification both on the SimGrid simulator (SG), and in the Real Life (RL). Here are the problems when you want to do so: Communication in SG is done by passing tasks, while in RL, you have to deal with sockets (or any wrapper to it). In RL, each process should provide a main() function, and it's obviously not the case in SG. Application class target If you want to run your code both in RL and in SG, you won't be able to use the full set of features offered by any of those two worlds. GRAS tries to provide a suffisent set of features to develop your application, and implement them in both worlds. GRAS uses the paradigm of event-driven programming, which is an extension to the message-passing one. Any process of a typical event-driven application declares callback to incoming events, which can be messages from other processes, timers or others. All messages have an header, specifying its type, and attached data, represented as one or several C structures. In order to send the data over the network in RL, a type-description mecanism is provided, and the RL version of GRAS implements CDR functionnalities. That is to say that the data are sent in the native format of the sender host, and converted on the destination host only if needed. In order to not reimplement the wheel, GRAS use existing code, and adapt them to make them work together. The SG version naturally use the SimGrid toolkit, while the RL version is based over the communication library used in NWS (note that this library was somehow modified, since the previous version use XDR, ie both the sender and the receiver convert the data from/to a so called network format). That's why some basic knowledge about how NWS work is supposed in this document. But don't worry, you only have to know the basics about NWS, the internals needed to understand the document will be presented when needed. Overview of the GRAS library Overview gras gras_private Implementation of GRAS suited for real life. RL SimGrid was designed to ease the comparison of algorithms and heuristics. That way, a lot of complicated notion from the system layer were volontary left off. For example, migrating a process from an host to another is as easy as: MSG_process_change_host(process, new_host). No need to tell that performing this operation on real platform is really harder. This simplification is a very good thing when you want to rapidly prototype code, but makes things somehow more complicated in GRAS since we want to have a realistic API, since it have to be implemented in reality also. The best example of complexity in GRAS_SG induced by simplicity in SimGrid is the sockets handling. There is no "socket" in SG, but only m_channel_t. In contrary to sockets from RL, no special treatment is needed for a process before writing or reading on/from a channel. So, a given channel can be pooled by more than one process. Likewise, you can send data to a channel that nobody is actually listening to. The SG implementation of GRAS repport as an error the fact that nobody is listening to the socket when trying to open a socket, or send stuff using a previously openned socket. That way, the SG version can be used to debug all syncronization issues. For that, we store mainly the PID of both the sender and the receiver in the socket structure, and then resolve PID->process at the lastest moment. This search is a bit expensive, but as long as there is no real garbage collection in SG, with the information "dead process" within the structure, it's the only solution to make sure that we won't dereference pointers to an old freed structure when the process on the other side of the structure did finish since the creation of the socket. As said in the overview, the processes can declare to hear on several sockets, but all incoming messages are handled by the same loop. So, we can use only one channel per process, and use a table on each host to determine to which process a message should be delivered depending on the socket number provided by the sender. RL, the implementation suited for real life. Implementation of GRAS on top of the simulator. SG nws_comm Sockets @c: @f: @c: @f: @a1: @c: @f: @a1: @a2: @c: @f: @a1: @a2: @a3: @c: @f: @a1: @a2: @a3: @a4: @c: @f: @a1: @a2: @a3: @a4: @a5: @c: @f: @c: @f: @a1: @c: @f: @a1: @a2: @c: @f: @a1: @a2: @a3: @c: @f: @a1: @a2: @a3: @a4: @c: @f: @a1: @a2: @a3: @a4: @a5: @c: @f: @c: @f: @a1: @c: @f: @a1: @a2: @c: @f: @a1: @a2: @a3: @c: @f: @a1: @a2: @a3: @a4: @c: @f: @a1: @a2: @a3: @a4: @a5: @c: @f: @c: @f: @a1: @c: @f: @a1: @a2: @c: @f: @a1: @a2: @a3: @c: @f: @a1: @a2: @a3: @a4: @c: @f: @a1: @a2: @a3: @a4: @a5: @c: @p: @f: @c: @p: @f: @a1: @c: @p: @f: @a1: @a2: @c: @p: @f: @a1: @a2: @a3: @c: @p: @f: @a1: @a2: @a3: @a4: @c: @p: @f: @a1: @a2: @a3: @a4: @a5: @c: @p: @f: @a1: @a2: @a3: @a4: @a5: @a6: @f: @f: @a1: @f: @a1: @a2: @f: @a1: @a2: @a3: @f: @a1: @a2: @a3: @a4: @f: @a1: @a2: @a3: @a4: @a5: @c: @f: @a1: @a2: @a3: @a4: @a5: @a6: @addr: @Param2: @sock: @timeOut: @Returns: @sock: @waitForPeer: @Returns: @destination: @source: @description: @length: @sourceFormat: @pid: @parentToChild: @childToParent: @Returns: @sock: @Returns: @description: @length: @format: @Returns: @CHAR_TYPE: @DOUBLE_TYPE: @FLOAT_TYPE: @INT_TYPE: @LONG_TYPE: @SHORT_TYPE: @UNSIGNED_INT_TYPE: @UNSIGNED_LONG_TYPE: @UNSIGNED_SHORT_TYPE: @STRUCT_TYPE: @LAST_TYPE: @whatType: @Returns: @Returns: @whatType: @Returns: @Param1: @Param2: @ear: @earPort: @Returns: @HOST_FORMAT: @NETWORK_FORMAT: @destination: @source: @whatType: @repetitions: @sourceFormat: @whatType: @repetitions: @format: @Returns: @addr: @Returns: @addr: @Returns: @addr: @Returns: @addr: @Returns: @machineOrAddress: @address: @machineOrAddress: @addressList: @atMost: @Returns: @timeOut: @sd: @ldap: @Returns: @sd: @Returns: @sd: @Returns: @machineOrAddress: @p: @f: @a1: @a2: @a3: @a4: @a5: @a6: @Returns: @notifyFn: @addr: @Param2: @sock: @Returns: @Param1: @Param2: @ear: @earPort: @Returns: @structType: @lastMember: @memberType: @repetitions: @sock: @child: @Returns: @sd: @Returns: @sd: @Returns: @sd: @Returns: @destination: @source: @whatType: @repetitions: @format: @type: @repetitions: @type: @repetitions: @offset: @sig: @Param1: @sd: @Returns: @sd: @Returns: @f: @a1: @a2: @a3: @a4: @a5: @a6: @sd: @msgType: @vdata: @sock: @Returns: @dd1: @c1: @dd2: @c2: @Returns: @description: @description: @Returns: @sd: @data: @description: @description_length: @repetition: @Returns: @sd: @data: @description: @description_length: @repetition: @Returns: @description: @ft: @Returns: @no_error: @malloc_error: @mismatch_error: @sanity_error: @system_error: @network_error: @timeout_error: @thread_error: @unknown_error: @Returns: @sd: @size: @id: @Returns: @msg: @timeOut: @msgId: @dataSize: @seqCount: @Returns: @msgId: @free_data_on_free: @seqCount: @Varargs: @Returns: @msg: @timeout: @Returns: @sd: @message: @name: @sequence_count: @Varargs: @Returns: @sd: @message: @sequence_count: @Varargs: @Returns: @sd: @timeout: @message: @sequence_count: @Varargs: @Returns: @Returns: @host: @Param2: @sock: @Returns: @Param1: @Param2: @sock: @Returns: @sd: @Returns: @sd: @Returns: @sd: @data: @description: @Returns: @message: @TTL: @cb: @sd: @data: @description: @Returns: @Returns: @type: @ud: @d1: @d2: @Returns: @dd1: @c1: @dd2: @c2: @dd: @c: @data: @name: @elm_type: @size: @code: @name: @element_type: @fixed_size: @dynamic_size: @post: @code: @Returns: @name: @referenced_type: @discriminant: @post: @code: @Returns: @name: @discriminant: @code: @struct_code: @field_name: @field_type_code: @struct_code: @field_name: @field_code: @pre_cb: @post_cb: @Returns: @struct_code: @field_name: @field_type_name: @struct_code: @field_name: @field_type_name: @pre_cb: @post_cb: @Returns: @struct_type: @name: @field_type_name: @Returns: @name: @pre_cb: @post_cb: @code: @Returns: @union_code: @field_name: @field_type_code: @union_code: @field_name: @field_code: @pre_cb: @post_cb: @Returns: @union_code: @field_name: @field_type_name: @union_code: @field_name: @field_type_name: @pre_cb: @post_cb: @Returns: @union_type: @name: @field_type_name: @Returns: @name: @field_count: @post: @code: @Returns: @name: @desc: @howmany: @code: @Returns: @dst: @name: @Cdefinition: @dst: @Returns: @code: @def: @type: @code: @type: @Returns: @name: @type: @Returns: @name: @element_type: @fixed_size: @dynamic_size: @post: @dst: @Returns: @name: @desc: @howmany: @dst: @Returns: @name: @default_value: @free_func: @size: @alignment: @post: @dst: @Returns: @name: @C_definition: @dst: @Returns: @name: @referenced_type: @discriminant: @post: @dst: @Returns: @name: @type: @Returns: @name: @pre: @post: @dst: @Returns: @struct_type: @name: @field_type: @pre: @post: @Returns: @name: @field_count: @post: @dst: @Returns: @union_type: @name: @field_type: @pre: @post: @Returns: @type: @Returns: @cursor: @Returns: @head: @key: @data: @free_ctn: @Returns: @head: @key: @key_len: @data: @free_ctn: @Returns: @head: @key: @data: @Returns: @head: @key: @key_len: @data: @Returns: @dynar: @cursor: @Returns: @dynar: @cursor: @whereto: @Returns: @Returns: @cat: @parent: @cat: @thresholdPriority: @sd: @size: @msg: @msgId: @free_data_on_free: @seqCount: @Varargs: @Returns: @msgId: @name: @sequence_count: @Varargs: @Returns: @host: @Param2: @sock: @Returns: @sock: @Returns: @sd: @Returns: @sd: @Returns: @Param1: @Param2: @sock: @Returns: @Returns: