doc/gtut-introduction.doc

   1 /**
   2 @page GRAS_tut_intro Introduction to the GRAS framework
   3
   4 \htmlinclude .gtut-introduction.doc.toc
   5
   6 \section GRAS_tut_intro_toc What will you find here
   7
   8  - The section \ref GRAS_tut_intro_what explains what the GRAS framework and how it
   9    relates to other existing solutions.
  10  - The section \ref GRAS_tut_intro_model presents somehow formaly the programmation
  11    model used in GRAS.
  12
  13 <hr>
  14
  15 \section GRAS_tut_intro_what What is GRAS
  16
  17 GRAS is a framework to implement and study distributed algorithms. It
  18 provides a simple communication API to allow several processes to
  19 interoperate through the exchange of messages. This is quite classical, and
  20 GRAS differs from the other existing messaging API by several points:
  21
  22   - \ref GRAS_tut_intro_what_2ways
  23   - \ref GRAS_tut_intro_what_dist
  24   - \ref GRAS_tut_intro_what_grid
  25   - \ref GRAS_tut_intro_what_target
  26   - \ref GRAS_tut_intro_what_simple
  27
  28 We now detail each of these points.
  29
  30 \subsection GRAS_tut_intro_what_2ways GRAS allows you to run both in simulation mode and on real platforms
  31
  32 We wrote two implementations of the interface: the first one is built on top
  33 of the SimGrid simulator, allowing you to run your application in a
  34 controled environment, which reveals precious to debug and study algorithms.
  35 Everyone who tried to run even simple tests on more than 100 real machines
  36 will consider a simulator as a nirvana.
  37
  38 The experiments can be reproduced in the exact same conditions (which is
  39 somehow hard in real settings), allowing for example to reproduce a bug as
  40 many times as you want while debugging. You can also test your algorithm
  41 under experimental conditions which you couldn't achieve on a real platform
  42 (like a network topology and/or size you do don't have access to). Under
  43 some conditions, SimGrid simulations are also much faster than real
  44 executions, allowing you to run more experiments in less time.
  45
  46 Once you assessed the quality of your algorithm in the simulator, you can
  47 deploy it on real platforms using the second implementation of the library.
  48 Usually, taking an algorithm out of a simulator implies an almost complete
  49 rewrite. There is no need to modify your program for this in GRAS. You don't
  50 even need to recompile it, but simply to relink your program it against the
  51 right library.
  52
  53 GRAS applications running on real hardware deliver high performance.
  54 The sequential parts of your code are not mediated by GRAS or slowed down
  55 anyhow. The communications use advanced data exchange and conversion
  56 mecanism ensuring that you are likely to get performance at least comparable
  57 to other communication solutions (FIXME: cite the paper once it gets
  58 accepted).
  59
  60 GRAS applications are portable on several operating systems (Linux, MacOS X,
  61 Solaris, IRIX, AIX and soon Windows) and several processor architectures
  62 (x86, amd64, ppc, sparc, etc). Moreover, GRAS processes can interoperate
  63 efficiently even when deployed on differing material. You can for example
  64 have a process deployed on ppc/MacOS X interacting transparently with
  65 another one deployed on alpha/Linux.
  66
  67 The simulation mode of GRAS is called usually SG (for SimGrid) while the in
  68 situ execution mode is called RL (for Real Life).
  69
  70 \subsection GRAS_tut_intro_what_dist GRAS was designed for distributed computing, not parallel computing
  71
  72 In GRAS, you build your algorithm as a set of independent processes
  73 interacting through messages. This is the well known MPMD model (multiple
  74 program, multiple data). It contrasts to the SPMD model (simple program,
  75 multiple data) and communications solutions such as MPI or PVM, where you
  76 build an uniq program with conditionnals here and there specifying what each
  77 processes should do (something like "If I'm process number 0, then send data
  78 to the others, else get the data sent to me").
  79
  80 None of these models are inherently better than the other, and there is a
  81 plenty of algorithms betterly expressed in the SPMD paradigm. If your
  82 program falls into that category, then GRAS may not be the right tool for
  83 you. We think however that most non-sequential algorithms can be expressed
  84 gracefully in a MPMD way where some are really difficult to express in a
  85 SPMD way.
  86
  87 There is no parallelism in GRAS, and it is discouraged to introduce threads
  88 in GRAS (althrough it should be possible in some months). This is an explict
  89 choice since threads are so hard to use (see the section \ref
  90 GRAS_tut_intro_what_simple below). The framework itself do use threads to
  91 achieve good performances, but I don't want to impose this to users (FIXME:
  92 actually, GRAS is not multi-threaded yet internally, but I plan to do so
  93 really soon).
  94
  95 \subsection GRAS_tut_intro_what_grid GRAS was designed for large scale computing
  96
  97 Another difference to the MPI communication libraries is that GRAS was not
  98 designed for static small-sized platforms such as clusters, but to dynamic
  99 larger-scale platforms such as grids. That is why GRAS do include static
 100 membership solutions such as the MPI channels. Support for fault-tolerance
 101 is also provided through the timeouts on communication primitives and
 102 through an exception mecanism.
 103
 104 GRAS also comes with a sister library called AMOK containing several usefull
 105 building block for large scale network aware applications. The most
 106 proheminent one allows to assess the network availabilities through active
 107 testing, just like the classical NWS tool in the grid research community. We
 108 are actively working on a network topology discovery mecanism and a
 109 distributed locking solution. Some other modules are planned, such as
 110 reliable broacasting in open environments.
 111
 112 \subsection GRAS_tut_intro_what_target GRAS targets at applicative overlay rather than end-user application
 113
 114 The application class targeted by GRAS is constituted of so called overlays.
 115 They do not constitute a complete application by themselves, but can be seen
 116 as a "distributed library", a thing offering offering a service to another
 117 application through a set of physically distributed entities. An example of
 118 such overlay could be a monitoring system allowing you to retrieve the
 119 available bandwidth between two remote hosts. It could be used in a
 120 network-aware parallel matrix multiplication library assigning more work to
 121 well interconnected nodes. I wouldn't advice to build a physical or
 122 biological compututation program on top of GRAS, even if it would be
 123 possible in theory.
 124
 125 In other words, GRAS is not a grid middleware in the common understanding of
 126 the world, but rather a tool to constitute the building bricks of such a
 127 middleware. GRAS is thus a sort of "underware" ;)
 128
 129 \subsection GRAS_tut_intro_what_simple GRAS tries to remain simple to use
 130
 131 A lot of effort was put into the framework so that it remains simple to the
 132 users. For example, you can exchange structured data (any kind of C data
 133 structure) just by passing its address, and the framework will create the
 134 exact same structure on the receiver side.
 135
 136 There is no threads like the pthread ones in GRAS, and it is not planned to
 137 introduce this in the future. This is an explicit choice since I consider
 138 multi-threading as too complicated for usual users. There is too much
 139 non-determinism, too many race conditions, and too few language-level
 140 constructs to keep yourself from screwing up. This idea is well expressed
 141 by John Ousterhout in <i>Why Threads Are a Bad Idea (for most purposes)</i>,
 142 published at USENIX'96. See section \ref GRAS_tut_intro_what_dist for
 143 platform performance consideration.
 144
 145 For the user code, I plan to allow the co-existance of several "gras
 146 processes" within the same regular unix process. The communication semantic
 147 will still be message-oriented, even if implemented using the shared memory
 148 for efficiency.
 149
 150 Likewise, there is no interuption mecanism in GRAS which could break the
 151 user code execution flow. When you write a function, you can be absolutely
 152 sure that nothing will happen between each lines of it. This assumption
 153 considerably simplify the code written in GRAS. The main use of of
 154 interruptions in a distributed application is to timeout communications when
 155 they fail. GRAS communication calls allow to setup a timeout value, and
 156 handle it internally (see below).
 157
 158 The only interruption mecanism used is constituted by exceptions, just like
 159 in C++ or Java (but implemented directly in C). They are propagated from the
 160 point where they are raised to a point where they will be trapped, if any,
 161 or abort the execution when not trapped. You can still be certain that
 162 nothing will happen between two lines of your code, but the second line may
 163 never be executed if the first one raises an exception ;)
 164
 165 This exception mecanism was introduced because without it, user code has to
 166 be loaded by tons of non-functional code to check whether an operation was
 167 properly performed or whether you have to pass the error condition to your
 168 caller.
 169
 170 <hr>
 171
 172 \section GRAS_tut_intro_model The model provided by GRAS
 173
 174 From a more formal point of view, GRAS overlays (=applications) can be seen
 175 as a set of state machines mainly interacting with messages. Because of the
 176 distributed setting of overlays, the internal state of each process cannot
 177 be accessed or modified directly by other processes. Even when it would be
 178 possible pratically (like in SG), it is forbidden by the model. This makes
 179 it difficult to gain a complete knowledge on the global system state. This
 180 global system state can still be defined by agregating the states of each
 181 processes, but this remains theoretical and impratical because of the
 182 probable combinatorial explosion.
 183
 184  - \ref GRAS_tut_intro_model_events
 185  - \ref GRAS_tut_intro_model_commmodel
 186  - \ref GRAS_tut_intro_model_timing_policy
 187  - \ref GRAS_tut_intro_model_exception
 188  - \ref GRAS_tut_intro_model_rpc
 189
 190 \subsection GRAS_tut_intro_model_events Event types
 191
 192 Two main types of events may change the internal state of a given process:
 193
 194  - <b>Incomming messages</b>. Messages are somehow strongly typed: a message
 195    type is described by its name (a string), and the C datatype of its
 196    payload. Any message of the same type will convey the same datatype, but
 197    of course the actual content of the payload may change from message to
 198    message of the same type.\n
 199    \n
 200    Processes may attach <b>callback functions</b> to the arrival of messages
 201    of a given type. They describe the action to achieve to handle the
 202    messages during the transition associated to this event.\n
 203    \n
 204    Incoming messages are not handled as soon as they arrive, but only when
 205    the process declares to be ready to accept incomming events (using \ref
 206    gras_msg_handle or related functions). It ensures that the treatment of a
 207    given message won't run in parallel to any other callback, so that
 208    process globals (its state) can be accessed and modified without
 209    locking.\n
 210    \n
 211    Messages received when the process is not ready to consume them are
 212    queued, and will be processed in order in the subsequent calls to \ref
 213    gras_msg_handle.\n
 214    \n
 215    Processes can also wait explicitely for incoming messages matching some
 216    given criterions (using \ref gras_msg_wait). Any messages received before the
 217    one matching the criterions will be added to the incomming messages'
 218    queue for further use. This may breaks the message delivery order.
 219    Moreover, there is no restriction on when this can be done. So, a
 220    callback to a given message can consume messages of other types. There is
 221    also no restriction on the criterion: you can specify a function in charge
 222    of examinating the messages either incoming or already in the queue and
 223    decide based on their meta-data (sender and message type) or their actual
 224    content whether they match your criterions.\n
 225    \n
 226    It is even possible to program processes so that they only explicitely
 227    wait for messages without using \ref gras_msg_handle to accept messages
 228    and start the callbacks associated to them. GRAS thus supports both the
 229    pure event-based programming model and the more classical message passing
 230    model.\n
 231
 232  - <b>Internal timers</b>. There is two types of timers: delayed actions and
 233    repetitive actions. The former happen only once when the delay expires
 234    while the second happen regularly each time that a period expires.\n
 235    \n
 236    Like incoming messages, timer treatments are not prehemptive. Ie, the
 237    function attached to a given timer will not start as soon as the period
 238    expires, but only when the process declares to be ready to accept
 239    incoming events. This also done in the \ref gras_msg_handle function, and
 240    expired timers are prioritaire with regard to incoming messages.
 241
 242 Messages are sent using the \ref gras_msg_send function. You should specify
 243 the receiver, the message type and the actual payload. This operation can
 244 happen at any time of your program. Message sending is not considered as a
 245 process state change, but rather as a reaction to an incoming event. It
 246 changes the state of another process, though. Trying to send messages to
 247 yourself will deadlock (althrough it may change in the future).
 248
 249 \subsection GRAS_tut_intro_model_commmodel Communication model
 250
 251 Send operations are <b>as synchronous as possible pratically</b>. They
 252 block the process until the message actually gets delivered to the receiving
 253 process (an acknoledgment is awaited). We thus have an <b>1-port model in
 254 emission</b>. This limitation allows the framework to signal error condition
 255 to the user code in the section which asked for the transmission, without
 256 having to rely on an interuption mecanism to signal errors asynchronously.
 257 This communication model is not completely synchronous in that sense that
 258 the receiver cannot be sure that the acknoledgment has been delivered
 259 (this is the classical byzantin generals problem). Pratically, the
 260 acknoledgment is so small that there is a good probability that the message
 261 where delivered. If you need more guaranty, you will need to implement
 262 better solutions in the user space.
 263
 264 Receive operations can be done in parallel, thanks to a specific thread
 265 within the framework. Moreover, the messages not matching the criterion in
 266 explicite receive are queued. The model is thus <b>N-port in reception</b>.
 267
 268 Previous paragraph describes the model we are targeting, but the current
 269 state of the implementation is a bit different: an acknoledgment is awaited
 270 in send operation only in SG (this is a bug of RL), and there is no specific
 271 thread for handling incoming communications yet. This shouldn't last long
 272 until we solve this.
 273
 274 \subsection GRAS_tut_intro_model_timing_policy Timing policy
 275
 276 All communication primitives allow 3 timout policies: one can only poll for
 277 incomming events (using timeout=0), wait endlessly for the communication to
 278 be performed (using timeout<0) or specify a maximal delay to wait for the
 279 communication to proceed (using timeout>0, being a number of seconds).
 280
 281 Again, this describes the targeted model. The current implementation does
 282 not allow to specify a delay for the outgoing communication. In SG, the
 283 delay is then hardcoded to 60 seconds while outgoing communication wait for
 284 ever to proceed in RL.
 285
 286 Another timing policy we plan to implement in the future is "adaptative
 287 timeouts", where the timeout is computed automatically by the framework
 288 according to performance of previous communications. This was demonstrated
 289 for example in the NWS tool.
 290
 291 \subsection GRAS_tut_intro_model_exception Error handling through exceptions
 292
 293 As explained in section \ref GRAS_tut_intro_what_simple, any function may
 294 raise exceptions breaking their execution. No support is provided by the
 295 framework to ensure that the internal state remains consistent when
 296 exceptions are raised. Changing this would imply that we are able to
 297 checkpoint the internal state to provide a transaction service, which seems
 298 quite difficult to achieve efficiently.
 299
 300 \subsection GRAS_tut_intro_model_rpc RPC messaging
 301
 302 In addition to the one-way messages described above, GRAS supports RPC
 303 communication. Using this, a client process asks for the execution of a
 304 callback on a server process. RPC types are close to regular message types:
 305 they are described by a type (a string), a payload type for the request, but
 306 in addition, they also have a payload type for the answer from the server to
 307 the client.
 308
 309 RPC can be either synchronous (the function blocks until an answer is
 310 received) or asynchronous (you send the request and wait later for the
 311 anwer). They accept the same timing policies than regular messages.
 312
 313 If the callback raises an exception on the server side, this exception will
 314 be trapped by the framework on the server side, sent back to the client
 315 side, and revived on the client side. So, if the client calls a RPC which
 316 raises an error, it will have to deal with the exception itself. No
 317 provision is given concerning the state consistency on the server side when
 318 an exception arise. The <tt>host</tt> fields of the exception structure
 319 indicates the name of the host on which it was raised.
 320
 321 The callback performing the treatment associated to a RPC can perform any
 322 kind of communication itself, including RPC. In the case where A calls a RPC
 323 on B, leading to B calling a RPC on C (ie, A->B->C), if an exception is
 324 raised on C, it will be forwarded back to A. The <tt>host</tt> field will
 325 indicate C.
 326
 327 <hr>
 328
 329 \section GRAS_tut_intro_next What's next?
 330
 331 Now that you know what GRAS is and the communication model used, it is time
 332 to move to the \ref GRAS_tut_tour section. There, you will build
 333 incrementally a full-featured GRAS application demonstrating most of the
 334 aspects of the framework.
 335
 336 */