Those two things are working, but we want to make everything as clean as
possible before releasing SimGrid v.3.
-So what about those nice DAGs we used to have in SimGrid v.1.? They're not
-anymore in SimGrid v.3. Let me recall you the way SimGrid 3 is organized:
+So what about those nice DAGs we used to have in SimGrid v.1.? They're
+not anymore in SimGrid v.3. At least not in their original form... Let
+me recall you the way SimGrid 3 is organized:
\verbatim
________________
----------------
\endverbatim
-XBT is our tool box and now, you should have an idea of what the other ones
-are. As you can see, the primitive SG is not here anymore. However it could
-still be brought back if people really need it. Here is how it would fit.
+XBT is our tool box and now, you should have an idea of what the other
+ones are. As you can see, the primitive SG is not here
+anymore. However we have written a brand new and cleaner API for this
+purpose: \ref SD_API. It is built directly on top of SURF and provides
+an API rather close to the old SG:
\verbatim
______________________
| User code |
|____________________|
-| | MSG | GRAS | SG |
+| | MSG | GRAS | SD |
| -------------------|
| | SURF |
| -------------------|
----------------------
\endverbatim
-Re-implementing SG on top of SURF is really straightforward (it only
-requires a little bit of time that I really don't have right now)
-since the only thing that lacks to SURF is the DAG part. But adding it
-to SURF would slow it down and therefore slow MSG and GRAS which is
-not a good thing. However it is really not on the top of our TODO
-list because we have to work on GRAS, and its MPI counterpart, and a
-parallel task model, and ... Anyway, we finally have migrated our CVS
-to gforge so people that are interested by helping on this part will
-have the possibility to do it.
+The nice thing is that, as it is writen on top of SURF, it seamlessly
+support DAG of parallel tasks as well as complex communications
+patterns. Some old codes using SG are currently under rewrite using
+\ref SD_API to check that all needful functions are provided.
\subsection faq_SG_DAG How to implement a distributed dynamic scheduler of DAGs.
Distributed is somehow "contagious". If you start making distributed
-decisions, there is no way to handle DAGs directly anymore (unless I am
-missing something). You have to encode your DAGs in term of communicating
-process to make the whole scheduling process distributed. Believe me, it is
-worth the effort since you'll then be able to try your algorithms in a very
-wide variety of conditions. Here is an example of how you could do that.
-Assume T1 has to be done before T2.
+decisions, there is no way to handle DAGs directly anymore (unless I
+am missing something). You have to encode your DAGs in term of
+communicating process to make the whole scheduling process
+distributed. Here is an example of how you could do that. Assume T1
+has to be done before T2.
\verbatim
int your_agent(int argc, char *argv[] {
\endverbatim
If you decide that the distributed part is not that much important and that
-DAG is really the level of abstraction you want to work with (but it
-prevents you from having "realistic" platform modeling), then you should
-keep using the 2.18.5 versions until somebody has ported SG on top of SURF.
-Note however that SURF will be slower than the old SG to handle traces with
-a lots of variations (there is no trace integration anymore).
-
-\subsection faq_SG_future Will SG come back in the maintained branch one day?
-
-Sure. In fact, we already have thought about a new and cleaner API:
-\verbatim
-void* SG_link_get_data(SG_link_t link);
-void SG_link_set_data(SG_link_t link, void *data);
-const char* SG_link_get_name(SG_link_t link);
-double SG_link_get_capacity(SG_link_t link);
-double SG_link_get_current_bandwidth(SG_link_t link);
-double SG_link_get_current_latency(SG_link_t link);
-
-SG_workstation_t SG_workstation_get_by_name(const char *name);
-SG_workstation_t* SG_workstation_get_list(void);
-int SG_workstation_get_number(void);
-void SG_workstation_set_data(SG_workstation_t workstation, void *data);
-void * SG_workstation_get_data(SG_workstation_t workstation);
-const char* SG_workstation_get_name(SG_workstation_t workstation);
-SG_link_t* SG_workstation_route_get_list(SG_workstation_t src, SG_workstation_t dst);
-int SG_workstation_route_get_size(SG_workstation_t src, SG_workstation_t dst);
-double SG_workstation_get_power(SG_workstation_t workstation);
-double SG_workstation_get_available_power(SG_workstation_t workstation);
-
-SG_task_t SG_task_create(const char *name, void *data, double amount);
-int SG_task_schedule(SG_task_t task, int workstation_nb,
- SG_workstation_t **workstation_list, double *computation_amount,
- double *communication_amount, double rate);
-
-void* SG_task_get_data(SG_task_t task);
-void SG_task_set_data(SG_task_t task, void *data);
-const char* SG_task_get_name(SG_task_t task);
-double SG_task_get_amount(SG_task_t task);
-double SG_task_get_remaining_amount(SG_task_t task);
-void SG_task_dependency_add(const char *name, void *data, SG_task_t src, SG_task_t dst);
-void SG_task_dependency_remove(SG_task_t src, SG_task_t dst);
-e_SG_task_state_t SG_task_state_get(SG_task_t task); /* e_SG_task_state_t can be either SG_SCHEDULED, SG_RUNNING, SG_DONE, or SG_FAILED */
-void SG_task_watch(SG_task_t task, e_SG_task_state_t state); /* SG_simulate will stop as soon as the state of this task is the one given in argument.
- Watch-point is then automatically removed */
-void SG_task_unwatch(SG_task_t task, e_SG_task_state_t state);
-
-void SG_task_unschedule(SG_task_t task); /* change state and rerun.. */
-
-SG_task_t *SG_simulate(double how_long); /* returns a NULL-terminated array of SG_task_t whose state has changed */
-\endverbatim
-
-We're just looking for somebody to implement it... :)
+DAG is really the level of abstraction you want to work with, then you should
+give a try to \ref SD_API.
\section faq_dynamic Dynamic resources and platform building
An example of this trick is distributed in the file examples/msg/msg_test_surfxml_bypassed.c
-\section faq_troubleshooting Troubleshooting
-
-\subsection faq_compil_trouble ./configure fails!
-
-We now only one reason for the configure to fail:
-
- - <b>You are using a borken build environment</b>\n
- If symptom is that configure complains about gcc not being able to build
- executables, you are probably missing the libc6-dev package. Damn Ubuntu.
-
-If you experience other kind of issue, please get in touch with us. We are
-always interested in improving our portability to new systems.
-
-\subsection faq_distcheck_fails Dude! "make check" fails on my machine!
-
-Don't assume we never run this target, because we do. Really. Promise!
-
-There is several reasons which may cause the make check to fail on your
-machine:
-
- - <b>You are using a borken compiler</b>.\n
- The symptom may be that the "make check" fails within testsuite/gras
- directory.\n
- For example, we failed to use gcc 4.0 with optimization flags. The
- workaround is either to install a gcc-3.4 compiler and change the /usr/bin/gcc
- link to let it point on /usr/bin/gcc-3.4 or use the
- --disable-compiler-optimizations of the configure script.\n
- This bug is really puzzeling: the first testcase of gras fails when
- SimGrid is compiled with any optimization flag (-O1 and above). More
- astonishing, it also fails when compiled with
- <tt>-O1 -fno-defer-pop -fno-guess-branch-probability -fno-cprop-registers -fno-loop-optimize -fno-if-conversion -fno-if-conversion2 -fno-merge-constants -fno-tree-ccp -fno-tree-dce -fno-tree-dominator-opts -fno-tree-dse -fno-tree-ter -fno-tree-lrs -fno-tree-sra -fno-tree-copyrename -no-ftree-fre -fno-tree-ch -fno-delayed-branch</tt>\n
- That long list of options comes down to enabling -O1, and then disabling
- all the optimizations that -O1 is supposed to enable, according to the
- gcc documentation. So, it should give the same results than -O0... The
- reason seems to be this little sentence in the gcc documentation: <i>Not
- all optimizations are controlled directly by a flag. Only optimizations
- that have a flag are listed.</i> Under such circumstances, there is not
- much we can do.\n
- <b>=> Avoid gcc-4.0 when compiling SimGrid!</b>
-
- - <b>You are using a borken libc (probably concerning the contextes)</b>.\n
- The symptom is that the "make check" fails within the examples/msg directory.\n
- By default, SimGrid uses something called ucontexts. This is part of the
- libc, but it's quite undertested. For example, some (old) versions of the
- glibc on alpha do not implement these functions, but provide the stubs
- (which return ENOSYS: not implemented). It fools our detection mecanism
- and leads to segfaults.\n
- On some x86_64, the pointer to function is stored into a integer, but int
- are 32bits only on this arch while pointers are 64bits. Our detection
- mecanism also fails to detect the problem, which leads to segfaults.\n
- In both cases, there is not much we can do to fix the bug. We are working
- on a workaround for x86_64 machines, but in the meanwhile, you can
- compile with --with-context=pthread to avoid ucontext completely. You'll
- be a bit more limitated in the number of simulated processes you can start
- concurently, but 5000 processes is still enough for most purposes, isn't
- it?\n
- This limitation is the reason why we insist on using this piece of ...
- software even if it's so troublesome.\n
- <b>=> use --with-pthread on AMD64 architecture</b>
-
- - <b>There is a bug in SimGrid we aren't aware of</b>.\n
- If none of the above apply, please drop us a mail on the mailing list so
- that we can check it out.
+\section faq_limits Pushing the limits
\subsection faq_context_1000 I want thousands of simulated processes
low, you'll get a segfault. The token ring example, which is quite simple,
runs with 40kb stacks.
+\section faq_troubleshooting Troubleshooting
+
+\subsection faq_compil_trouble ./configure fails!
+
+We now only one reason for the configure to fail:
+
+ - <b>You are using a borken build environment</b>\n
+ If symptom is that configure complains about gcc not being able to build
+ executables, you are probably missing the libc6-dev package. Damn Ubuntu.
+
+If you experience other kind of issue, please get in touch with us. We are
+always interested in improving our portability to new systems.
+
+\subsection faq_distcheck_fails Dude! "make check" fails on my machine!
+
+Don't assume we never run this target, because we do. Really. Promise!
+
+There is several reasons which may cause the make check to fail on your
+machine:
+
+ - <b>You are using a borken libc (probably concerning the contextes)</b>.\n
+ The symptom is that the "make check" fails within the examples/msg directory.\n
+ By default, SimGrid uses something called ucontexts. This is part of the
+ libc, but it's quite undertested. For example, some (old) versions of the
+ glibc on alpha do not implement these functions, but provide the stubs
+ (which return ENOSYS: not implemented). It fools our detection mecanism
+ and leads to segfaults.\n
+ On some x86_64, the pointer to function is stored into a integer, but int
+ are 32bits only on this arch while pointers are 64bits. Our detection
+ mecanism also fails to detect the problem, which leads to segfaults.\n
+ In both cases, there is not much we can do to fix the bug. We are working
+ on a workaround for x86_64 machines, but in the meanwhile, you can
+ compile with --with-context=pthread to avoid ucontext completely. You'll
+ be a bit more limitated in the number of simulated processes you can start
+ concurently, but 5000 processes is still enough for most purposes, isn't
+ it?\n
+ This limitation is the reason why we insist on using this piece of ...
+ software even if it's so troublesome.\n
+ <b>=> use --with-pthread on AMD64 architecture that do not have an
+ ultra-recent libc.</b>
+
+ - <b>There is a bug in SimGrid we aren't aware of</b>.\n
+ If none of the above apply, please drop us a mail on the mailing list so
+ that we can check it out.
+
\subsection faq_longjmp longjmp madness
This is when valgrind starts complaining about longjmp things, just like:
before the client get a chance to read them (use gras_os_sleep() to delay
the server), or the server died awfully before the client got the data.
+\subsection faq_valgrind Valgrind spits tons of errors!
+
+It may happen that valgrind, the memory debugger beloved by any decent C
+programmer, spits tons of warnings like the following :
+\verbatim ==8414== Conditional jump or move depends on uninitialised value(s)
+==8414== at 0x400882D: (within /lib/ld-2.3.6.so)
+==8414== by 0x414EDE9: (within /lib/tls/i686/cmov/libc-2.3.6.so)
+==8414== by 0x400B105: (within /lib/ld-2.3.6.so)
+==8414== by 0x414F937: _dl_open (in /lib/tls/i686/cmov/libc-2.3.6.so)
+==8414== by 0x4150F4C: (within /lib/tls/i686/cmov/libc-2.3.6.so)
+==8414== by 0x400B105: (within /lib/ld-2.3.6.so)
+==8414== by 0x415102D: __libc_dlopen_mode (in /lib/tls/i686/cmov/libc-2.3.6.so)
+==8414== by 0x412D6B9: backtrace (in /lib/tls/i686/cmov/libc-2.3.6.so)
+==8414== by 0x8076446: xbt_dictelm_get_ext (dict_elm.c:714)
+==8414== by 0x80764C1: xbt_dictelm_get (dict_elm.c:732)
+==8414== by 0x8079010: xbt_cfg_register (config.c:208)
+==8414== by 0x806821B: MSG_config (msg_config.c:42)
+\endverbatim
+
+This problem is somewhere in the libc when using the backtraces and there is
+very few things we can do ourselves to fix it. Instead, here is how to tell
+valgrind to ignore the error. Add the following to your ~/.valgrind.supp (or
+create this file on need). Make sure to change the obj line according to
+your personnal mileage (change 2.3.6 to the actual version you are using,
+which you can retrieve with a simple "ls /lib/ld*.so").
+
+\verbatim {
+ name: Backtrace madness
+ Memcheck:Cond
+ obj:/lib/ld-2.3.6.so
+ fun:dl_open_worker
+ fun:_dl_open
+ fun:do_dlopen
+ fun:dlerror_run
+ fun:__libc_dlopen_mode
+}\endverbatim
+
+Then, you have to specify valgrind to use this suppression file by passing
+the <tt>--suppressions=$HOME/.valgrind.supp</tt> option on the command line.
+You can also add the following to your ~/.bashrc so that it gets passed
+automatically. Actually, it passes a bit more options to valgrind, and this
+happen to be my personnal settings. Check the valgrind documentation for
+more information.
+
+\verbatim export VALGRIND_OPTS="--leak-check=yes --leak-resolution=high --num-callers=40 --tool=memcheck --suppressions=$HOME/.valgrind.supp" \endverbatim
\subsection faq_deadlock There is a deadlock !!!