+++ /dev/null
-This file was the README in a directory constituting an attempt to
-code a new trace replayer for MPI actions, aiming at maximal
-performances. Modifying it is not for faint of heart, since it could
-be compared to a mixure of the assembly and basic programming
-philosophy (in worse) reserved to SimGrid experts.
-
-It was so difficult to use that this has been removed from the SVN.
-This file is thus mainly for history.
-
-Shiny side: glance at interface
-===============================
-
-It uses a new simix context factory: state_machine. Each user process
-is a state machine. There is no system mystery such as pthread or
-ucontexts to save its stack. As a result, there is no stack. Each
-user process only have a user-provided structure describing its state,
-and only compute its next state based on that. Your main() can be as
-simple as:
-
- #include "replay.h"
-
- int main() {
- SG_replay_init(&argc,argv);
- SG_replay_set_functions(init_fun, run_fun, fini_fun);
- SG_replay("platform.xml","deployment.xml");
- return 0;
- }
-
- * init_fun: user function in charge of creating the structure for
- each process in the simulation.
- * run_fun: user function called each time that a process must run. It
- takes as first argument the structure describing the
- current process.
- * fini_fun: user function in charge of freeing the memory allocated to
- the structure describing a process.
-
-This way of organizing the code saves a *huge amount* of memory
-(regular contexts have 128kb stacks per user process, threads are
-even more expensive) and greatly speeds things up (there is absolutely
-no nothing to ask to the system, and everything can be done in user
-space).
-
-A simple to use and efficient trace parser is also provided:
- /* constructor/destructor */
- replay_trace_reader_t replay_trace_reader_new(const char*filename);
- void replay_trace_reader_free(replay_trace_reader_t *reader);
- /* get a new event. Don't free the content, strdup what you want to
- keep after next call to reader_get() */
- const char **replay_trace_reader_get(replay_trace_reader_t r);
- /* return a "file:pos" description of the last thing we read. */
- const char *replay_trace_reader_position(replay_trace_reader_t r);
-Check replay_trace_reader.c for souce code, and replay_MPI.c for
-example of use.
-
-
-
-Dark side: restrictions on user code
-====================================
-
-The incredible performance of this approach comes at a price: using
-SimGrid this way is a *real* pain in the ass. You cannot use MSG nor
-SMPI nor nothing (because none of these interfaces were coded
-with the *extrem* requirement of the state_machine in mind), and you
-can only rely on SIMIX. From SIMIX, you can only use simcalls (ie, the
-simcall_* functions). Moreover, you must know that each blocking
-simcall will result in an interruption of your execution flow.
-
-Let's take an example: If your code contains:
- smx_synchro_t act = simcall_comm_isend(......);
- simcall_comm_wait(act);
- simcall_comm_destroy(act);
-
-The execution flow is interrupted brutally somewhere within
-simcall_comm_isend(), the variable act will never be set (and any
-code written after the first line is discarded).
-
-Indeed each SIMIX simcall results in an interruption of the calling
-process, but in state_machine there is only one system stack and the
-whole state describing the process is in the structure describing it.
-So, when we need to remove one process from the system, to pause it,
-we do it the hard way: the stack [of maestro] is restored to the state
-in which maestro put it, whatever what the user process put on it.
-
-In short, each time simix wants to interrupt a process, state_machine
-does a longjmp(2) to the point just before calling the user code. As a
-result, each time you do a simcall, your stack is destroyed to restore
-it in the state where maestro put it before calling your code.
-
-This means that you cannot do anything after a simcall, and that the
-stack is not a safe storing area for your data.
-
-So, you have to write your code as a state machine, with a big ugly
-switch. The previous code must be written something like:
-
-run_fun(globals, res) {
-
- switch (globals->state) {
- case l1: /* default value st. we take that branch the first time */
- globals->state = l2;
- simcall_comm_isend(....); /* syscall=>hard interrupt on our code */
- case l2: /* we'll take that branch the second time we're scheduled */
- globals->comm = res;
- globals->state = l3;
- simcall_comm_wait(globals->comm); /* syscall=>interrupt */
- case l3:
- globals->state = where_you_want_to_go_today;
- simcall_comm_destroy(globals->comm);
- }
-}
-
-As you can see, the result of the /previous/ syscall is passed as second
-argument to the run_fun().
-
-
-Isn't all this beautifully awful?? A few gotos in your code are just
-what you need to go 20 years back to the good old time of gwbasic...