src/replay/README

   1 This directory constitutes an attempt to code a new trace replayer for
   2 MPI actions, aiming at maximal performances. Modifying it is not for
   3 faint of heart, since it could be compared to a mixure of the assembly
   4 and basic programming philosophy (in worse) reserved to SimGrid experts.
   5
   6 Shiny side: glance at interface
   7 ===============================
   8
   9 It uses a new simix context factory: state_machine. Each user process
  10 is a state machine. There is no system mystery such as pthread or
  11 ucontextes to save its stack. As a result, there is no stack. Each
  12 user process only have a user-provided structure describing its state,
  13 and only compute its next state based on that. Your main() can be as
  14 simple as:
  15
  16   #include "replay.h"
  17
  18   int main() {
  19     SG_replay_init(&argc,argv);
  20     SG_replay_set_functions(init_fun, run_fun, fini_fun);
  21     SG_replay("platform.xml","deployment.xml");
  22     return 0;
  23   }
  24
  25  * init_fun: user function in charge of creating the structure for
  26              each process in the simulation.
  27  * run_fun: user function called each time that a process must run. It
  28             takes as first argument the structure describing the
  29             current process.
  30  * fini_fun: user function in charge of freeing the memory allocated to
  31              the structure describing a process.
  32
  33 This way of organizing the code saves a *huge amount* of memory
  34 (regular contextes have 128kb stacks per user process, threads are
  35 even more expensive) and greatly speeds things up (there is absolutely
  36 no nothing to ask to the system, and everything can be done in user
  37 space).
  38
  39 A simple to use and efficient trace parser is also provided:
  40   /* constructor/destructor */
  41   replay_trace_reader_t replay_trace_reader_new(const char*filename);
  42   void replay_trace_reader_free(replay_trace_reader_t *reader);
  43   /* get a new event. Don't free the content, strdup what you want to
  44      keep after next call to reader_get() */
  45   const char * const*replay_trace_reader_get(replay_trace_reader_t r);
  46   /* return a "file:pos" description of the last thing we read. */
  47   const char *replay_trace_reader_position(replay_trace_reader_t r);
  48 Check replay_trace_reader.c for souce code, and replay_MPI.c for
  49 example of use.
  50
  51
  52
  53 Dark side: restrictions on user code
  54 ====================================
  55
  56 The incredible performance of this approach comes at a price: using
  57 SimGrid this way is a *real* pain in the ass. You cannot use MSG nor
  58 GRAS nor SMPI nor nothing (because none of these interfaces were coded
  59 with the *extrem* requirement of the state_machine in mind), and you
  60 can only rely on SIMIX. From SIMIX, you can only use requests (ie, the
  61 SIMIX_req_* functions). Moreover, you must know that each blocking
  62 request will result in an interruption of your execution flow.
  63
  64 Let's take an example: If your code contains:
  65    smx_action_t act = SIMIX_req_comm_isend(......);
  66    SIMIX_req_comm_wait(act);
  67    SIMIX_req_comm_destroy(act);
  68
  69 The execution flow is interrupted brutally somewhere within
  70 SIMIX_req_comm_isend(), the variable act will never be set (and any
  71 code written after the first line is discarded).
  72
  73 Indeed each SIMIX syscall results in an interruption of the calling
  74 process, but in state_machine there is only one system stack and the
  75 whole state describing the process is in the structure describing it.
  76 So, when we need to remove one process from the system, to pause it,
  77 we do it the hard way: the stack [of maestro] is restored to the state
  78 in which maestro put it, whatever what the user process put on it.
  79
  80 In short, each time simix wants to interrupt a process, state_machine
  81 does a longjmp(2) to the point just before calling the user code. As a
  82 result, each time you do a syscall, your stack is destroyed to restore
  83 it in the state where maestro put it before calling your code.
  84
  85 This means that you cannot do anything after a syscall, and that the
  86 stack is not a safe storing area for your data.
  87
  88 So, you have to write your code as a state machine, with a big ugly
  89 switch. The previous code must be written something like:
  90
  91 run_fun(globals, res) {
  92
  93   switch (globals->state) {
  94   case l1: /* default value st. we take that branch the first time */
  95     globals->state = l2;
  96     SIMIX_req_comm_isend(....); /* syscall=>hard interrupt on our code*/
  97   case l2: /* we'll take that branch the second time we're scheduled */
  98     globals->comm = res;
  99     globals->state = l3;
 100     SIMIX_req_comm_wait(globals->comm); /* syscall=>interrupt */
 101   case l3:
 102     globals->state = where_you_want_to_go_today;
 103     SIMIX_req_comm_destroy(globals->comm);
 104   }
 105 }
 106
 107 As you can see, the result of the /previous/ syscall is passed as second
 108 argument to the run_fun().
 109
 110
 111 Isn't all this beautifully awful?? A few gotos in your code are just
 112 what you need to go 20 years back to the good old time of gwbasic...