-\subsubsection faq_SG_DAG How to implement a distributed dynamic scheduler of DAGs.
-
-Distributed is somehow "contagious". If you start making distributed
-decisions, there is no way to handle DAGs directly anymore (unless I
-am missing something). You have to encode your DAGs in term of
-communicating process to make the whole scheduling process
-distributed. Here is an example of how you could do that. Assume T1
-has to be done before T2.
-
-\code
- int your_agent(int argc, char *argv[] {
- ...
- T1 = MSG_task_create(...);
- T2 = MSG_task_create(...);
- ...
- while(1) {
- ...
- if(cond) MSG_task_execute(T1);
- ...
- if((MSG_task_get_remaining_computation(T1)=0.0) && (you_re_in_a_good_mood))
- MSG_task_execute(T2)
- else {
- /* do something else */
- }
- }
- }
-\endcode
-
-If you decide that the distributed part is not that much important and that
-DAG is really the level of abstraction you want to work with, then you should
-give a try to \ref SD_API.
-
-\subsection faq_MIA_generic Generic features
-
-\subsubsection faq_MIA_batch_scheduler Is there a native support for batch schedulers in SimGrid?
-
-No, there is no native support for batch schedulers and none is
-planned because this is a very specific need (and doing it in a
-generic way is thus very hard). However some people have implemented
-their own batch schedulers. Vincent Garonne wrote one during his PhD
-and put his code in the contrib directory of our SVN so that other can
-keep working on it. You may find inspiring ideas in it.
-
-\subsubsection faq_MIA_checkpointing I need a checkpointing thing
-
-Actually, it depends on whether you want to checkpoint the simulation, or to
-simulate checkpoints.
-
-The first one could help if your simulation is a long standing process you
-want to keep running even on hardware issues. It could also help to
-<i>rewind</i> the simulation by jumping sometimes on an old checkpoint to
-cancel recent calculations.\n
-Unfortunately, such thing will probably never exist in SG. One would have to
-duplicate all data structures because doing a rewind at the simulator level
-is very very hard (not talking about the malloc free operations that might
-have been done in between). Instead, you may be interested in the Libckpt
-library (http://www.cs.utk.edu/~plank/plank/www/libckpt.html). This is the
-checkpointing solution used in the condor project, for example. It makes it
-easy to create checkpoints (at the OS level, creating something like core
-files), and rerunning them on need.
-
-If you want to simulate checkpoints instead, it means that you want the
-state of an executing task (in particular, the progress made towards
-completion) to be saved somewhere. So if a host (and the task executing on
-it) fails (cf. #MSG_HOST_FAILURE), then the task can be restarted
-from the last checkpoint.\n
-
-Actually, such a thing does not exist in SimGrid either, but it's just
-because we don't think it is fundamental and it may be done in the user code
-at relatively low cost. You could for example use a watcher that
-periodically get the remaining amount of things to do (using
-MSG_task_get_remaining_computation()), or fragment the task in smaller
-subtasks.
-
-\subsection faq_platform Platform building and Dynamic resources
-
-\subsubsection faq_platform_example Where can I find SimGrid platform files?
-
-There are several little examples in the archive, in the examples/msg
-directory. From time to time, we are asked for other files, but we
-don't have much at hand right now.
-
-You should refer to the Platform Description Archive
-(http://pda.gforge.inria.fr) project to see the other platform file we
-have available, as well as the Simulacrum simulator, meant to generate
-SimGrid platforms using all classical generation algorithms.
-
-\subsubsection faq_platform_alnem How can I automatically map an existing platform?
-
-We are working on a project called ALNeM (Application-Level Network
-Mapper) which goal is to automatically discover the topology of an
-existing network. Its output will be a platform description file
-following the SimGrid syntax, so everybody will get the ability to map
-their own lab network (and contribute them to the catalog project).
-This tool is not ready yet, but it move quite fast forward. Just stay
-tuned.
-
-\subsubsection faq_platform_synthetic Generating synthetic but realistic platforms
-
-The third possibility to get a platform file (after manual or
-automatic mapping of real platforms) is to generate synthetic
-platforms. Getting a realistic result is not a trivial task, and
-moreover, nobody is really able to define what "realistic" means when
-speaking of topology files. You can find some more thoughts on this
-topic in these
-<a href="http://graal.ens-lyon.fr/~alegrand/articles/Simgrid-Introduction.pdf">slides</a>.
-
-If you are looking for an actual tool, there we have a little tool to
-annotate Tiers-generated topologies. This perl-script is in
-<tt>tools/platform_generation/</tt> directory of the SVN. Dinda et Al.
-released a very comparable tool, and called it GridG.
-
-
-The specified computing power will be available to up to 6 sequential
-tasks without sharing. If more tasks are placed on this host, the
-resource will be shared accordingly. For example, if you schedule 12
-tasks on the host, each will get half of the computing power. Please
-note that although sound, this model were never scientifically
-assessed. Please keep this fact in mind when using it.
-
-
-\subsubsection faq_platform_random Using random variable for the resource power or availability
-
-The best way to model the resouce power using a random variable is to
-use an availability trace that is directed by a probability
-distribution. This can be done using the function
-tmgr_trace_generator_value() below. The date and value generators is
-created with one of tmgr_event_generator_new_uniform(),
-tmgr_event_generator_new_exponential() or
-tmgr_event_generator_new_weibull() (if you need other generators,
-adding them to src/surf/trace_mgr.c should be quite trivial and your
-patch will be welcomed). Once your trace is created, you have to
-connect it to the resource with the function
-sg_platf_new_trace_connect().
-
-That the process is very similar if you want to model the
-resource availability with a random variable (deciding whether it's
-on/off instead of deciding its speed) using the function
-tmgr_trace_generator_state() or tmgr_trace_generator_avail_unavail()
-instead of tmgr_trace_generator_value().
-
-Unfortunately, all this is currently lacking a proper documentation,
-and there is even no proper example of use. You'll thus have to check
-the header file include/simgrid/platf.h and experiment a bit by
-yourself. The following code should be a good starting point, and
-contributing a little clean example would be a good way to help the
-SimGrid project.
-
-@code
-tmgr_trace_generator_value("mytrace",tmgr_event_generator_new_exponential(.5)
- tmgr_event_generator_new_uniform(100000,9999999));
-
-sg_platf_trace_connect_cbarg_t myconnect = SG_PLATF_TRACE_CONNECT_INITIALIZER;
-myconnect.kind = SURF_TRACE_CONNECT_KIND_BANDWIDTH;
-myconnect.trace = "mytrace";
-myconnect.element = "mylink";
-
-sg_platf_trace_connect(myconnect);
-@endcode
-
-\section faq_troubleshooting Troubleshooting
-
-\subsection faq_trouble_changelog The feature X stopped to work after my last update
-
-I guess that you want to read the ChangeLog file, that always contains
-all the information that could be important to the users during the
-upgrade. Actually, you may want to read it (alongside with the NEWS
-file that highlights the most important changes) even before you
-upgrade your copy of SimGrid, too.
-
-We do our best to maintain the backward compatibility, but we
-sometimes have to fix the things that are too broken. If we happen to
-kill a feature that you were using, we are sorry. We think that you
-should update to the new way of doing things, but if you can't afford
-it, that's ok. Just stick to the last version that were working for
-you, and have a pleasant day.
-
-\subsection faq_trouble_lib_compil SimGrid compilation and installation problems
-
-\subsubsection faq_trouble_lib_config cmake fails!
-
-We know only one reason for the configure to fail:
-
- - <b>You are using a broken build environment</b>\n
- Try updating your cmake version. If symptom is that the configury
- magic complains about gcc not being able to build executables, you
- are probably missing the libc6-dev package. Damn Ubuntu.
-
-If you experience other kind of issue, please get in touch with us. We are
-always interested in improving our portability to new systems.
-
-\subsubsection faq_trouble_distcheck Dude! "ctest" fails on my machine!
-
-Don't assume we never run this target, because we do. Check
-http://cdash.inria.fr/CDash/index.php?project=Simgrid (click on
-previous if there is no result for today: results are produced only by
-11am, French time) and
-https://buildd.debian.org/status/logs.php?pkg=simgrid if you don't believe us.
-
-If it's failing on your machine in a way not experienced by the
-autobuilders above, please drop us a mail on the mailing list so that
-we can check it out. Make sure to read \ref faq_bugrepport before you
-do so.
-
-\subsection faq_trouble_compil User code compilation problems
-
-\subsubsection faq_trouble_err_logcat "gcc: _simgrid_this_log_category_does_not_exist__??? undeclared (first use in this function)"
-
-This is because you are using the log mechanism, but you didn't created
-any default category in this file. You should refer to \ref XBT_log
-for all the details, but you simply forgot to call one of
-XBT_LOG_NEW_DEFAULT_CATEGORY() or XBT_LOG_NEW_DEFAULT_SUBCATEGORY().
-
-\subsubsection faq_trouble_pthreadstatic "gcc: undefined reference to pthread_key_create"
-
-This indicates that one of the library SimGrid depends on (libpthread
-here) was missing on the linking command line. Dependencies of
-libsimgrid are expressed directly in the dynamic library, so it's
-quite impossible that you see this message when doing dynamic linking.
-
-If you compile your code statically (and if you use a pthread version
-of SimGrid), you must absolutely
-specify <tt>-lpthread</tt> on the linker command line. As usual, this should
-come after <tt>-lsimgrid</tt> on this command line.
-
-\subsection faq_trouble_errors Runtime error messages
-
-\subsubsection faq_trouble_errors_big_fat_warning I'm told that my XML files are too old.
-
-The format of the XML platform description files is sometimes
-improved. For example, we decided to change the units used in SimGrid
-from MBytes, MFlops and seconds to Bytes, Flops and seconds to ease
-people exchanging small messages. We also reworked the route
-descriptions to allow more compact descriptions.
-
-That is why the XML files are versionned using the 'version' attribute
-of the root tag. Currently, it should read:
-@verbatim
- <platform version="4">
-@endverbatim
-
-If your files are too old, you can use the simgrid_update_xml.pl
-script which can be found in the tools directory of the archive.
-
-\subsection faq_trouble_debug Debugging SMPI applications
-
-In order to debug SMPI programs, you can use the following options:
-
-- <b>-wrapper 'gdb --args'</b>: this option is used to use a wrapper
- in order to call the SMPI process. Good candidates for this options
- are "gdb --args", "valgrind", "rr record", "strace", etc;
-
-- <b>-foreground</b>: this options gives the debugger access to the terminal
- which is needed in order to use an interactive debugger.
-
-Both options are needed in order to run the SMPI process under GDB.
-
-\subsection faq_trouble_valgrind Valgrind-related and other debugger issues
-
-If you don't, you really should use valgrind to debug your code, it's
-almost magic.
-
-\subsubsection faq_trouble_vg_libc Valgrind spits tons of errors about backtraces!
-
-It may happen that valgrind, the memory debugger beloved by any decent C
-programmer, spits tons of warnings like the following :
-\verbatim ==8414== Conditional jump or move depends on uninitialised value(s)
-==8414== at 0x400882D: (within /lib/ld-2.3.6.so)
-==8414== by 0x414EDE9: (within /lib/tls/i686/cmov/libc-2.3.6.so)
-==8414== by 0x400B105: (within /lib/ld-2.3.6.so)
-==8414== by 0x414F937: _dl_open (in /lib/tls/i686/cmov/libc-2.3.6.so)
-==8414== by 0x4150F4C: (within /lib/tls/i686/cmov/libc-2.3.6.so)
-==8414== by 0x400B105: (within /lib/ld-2.3.6.so)
-==8414== by 0x415102D: __libc_dlopen_mode (in /lib/tls/i686/cmov/libc-2.3.6.so)
-==8414== by 0x412D6B9: backtrace (in /lib/tls/i686/cmov/libc-2.3.6.so)
-==8414== by 0x8076446: xbt_dictelm_get_ext (dict_elm.c:714)
-==8414== by 0x80764C1: xbt_dictelm_get (dict_elm.c:732)
-==8414== by 0x8079010: xbt_cfg_register (config.c:208)
-==8414== by 0x806821B: MSG_config (msg_config.c:42)
-\endverbatim
-
-This problem is somewhere in the libc when using the backtraces and there is
-very few things we can do ourselves to fix it. Instead, here is how to tell
-valgrind to ignore the error. Add the following to your ~/.valgrind.supp (or
-create this file on need). Make sure to change the obj line according to
-your personnal mileage (change 2.3.6 to the actual version you are using,
-which you can retrieve with a simple "ls /lib/ld*.so").
-
-\verbatim {
- name: Backtrace madness
- Memcheck:Cond
- obj:/lib/ld-2.3.6.so
- fun:dl_open_worker
- fun:_dl_open
- fun:do_dlopen
- fun:dlerror_run
- fun:__libc_dlopen_mode
-}\endverbatim
-
-Then, you have to specify valgrind to use this suppression file by passing
-the <tt>--suppressions=$HOME/.valgrind.supp</tt> option on the command line.
-You can also add the following to your ~/.bashrc so that it gets passed
-automatically. Actually, it passes a bit more options to valgrind, and this
-happen to be my personnal settings. Check the valgrind documentation for
-more information.
-
-\verbatim export VALGRIND_OPTS="--leak-check=yes --leak-resolution=high --num-callers=40 --tool=memcheck --suppressions=$HOME/.valgrind.supp" \endverbatim
-
-\subsubsection faq_trouble_backtraces Truncated backtraces
-
-When debugging SimGrid, it's easier to pass the
---disable-compiler-optimization flag to the configure if valgrind or
-gdb get fooled by the optimization done by the compiler. But you
-should remove these flag when everything works before going in
-production (before launching your 1252135 experiments), or everything
-will run only one half of the true SimGrid potential.
-
-\subsection faq_deadlock There is a deadlock in my code!!!
-
-Unfortunately, we cannot debug every code written in SimGrid. We
-furthermore believe that the framework provides ways enough
-information to debug such information yourself. If the textual output
-is not enough, Make sure to check the \ref faq_visualization FAQ entry to see
-how to get a graphical one.
-
-Now, if you come up with a really simple example that deadlocks and
-you're absolutely convinced that it should not, you can ask on the
-list. Just be aware that you'll be severely punished if the mistake is
-on your side... We have plenty of FAQ entries to redact and new
-features to implement for the impenitents! ;)
-
-\subsection faq_surf_network_latency I get weird timings when I play with the latencies.
-
-OK, first of all, remember that units should be Bytes, Flops and
-Seconds. If you don't use such units, some SimGrid constants (e.g. the
-SG_TCP_CTE_GAMMA constant used in most network models) won't have the
-right unit and you'll end up with weird results.
-
-Here is what happens with a single transfer of size L on a link
-(bw,lat) when nothing else happens.
-
-\verbatim
-0-----lat--------------------------------------------------t
-|-----|**** real_bw =min(bw,SG_TCP_CTE_GAMMA/(2*lat)) *****|
-\endverbatim
-
-In more complex situations, this min is the solution of a complex
-max-min linear system. Have a look
-<a href="http://lists.gforge.inria.fr/pipermail/simgrid-devel/2006-April/thread.html">here</a>
-and read the two threads "Bug in SURF?" and "Surf bug not
-fixed?". You'll have a few other examples of such computations. You
-can also read "A Network Model for Simulation of Grid Application" by
-Henri Casanova and Loris Marchal to have all the details. The fact
-that the real_bw is smaller than bw is easy to understand. The fact
-that real_bw is smaller than SG_TCP_CTE_GAMMA/(2*lat) is due to the
-window-based congestion mechanism of TCP. With TCP, you can't exploit
-your huge network capacity if you don't have a good round-trip-time
-because of the acks...
-
-Anyway, what you get is t=lat + L/min(bw,SG_TCP_CTE_GAMMA/(2*lat)).
-
- * if I you set (bw,lat)=(100 000 000, 0.00001), you get t = 1.00001 (you fully
-use your link)
- * if I you set (bw,lat)=(100 000 000, 0.0001), you get t = 1.0001 (you're on the
-limit)
- * if I you set (bw,lat)=(100 000 000, 0.001), you get t = 10.001 (ouch!)
-
-This bound on the effective bandwidth of a flow is not the only thing
-that may make your result be unexpected. For example, two flows
-competing on a saturated link receive an amount of bandwidth inversely
-proportional to their round trip time.
-
-\subsection faq_bugrepport So I've found a bug in SimGrid. How to report it?
-
-We do our best to make sure to hammer away any bugs of SimGrid, but this is
-still an academic project so please be patient if/when you find bugs in it.
-If you do, the best solution is to drop an email either on the simgrid-user
-or the simgrid-devel mailing list and explain us about the issue. You can
-also decide to open a formal bug report using the
-<a href="https://gforge.inria.fr/tracker/?atid=165&group_id=12&func=browse">relevant
-interface</a>. You need to login on the server to get the ability to submit
-bugs.
-
-We will do our best to solve any problem repported, but you need to help us
-finding the issue. Just telling "it segfault" isn't enough. Telling "It
-segfaults when running the attached simulator" doesn't really help either.
-You may find the following article interesting to see how to repport
-informative bug repports:
-http://www.chiark.greenend.org.uk/~sgtatham/bugs.html (it is not SimGrid
-specific at all, but it's full of good advices).
-
-\author Da SimGrid team <simgrid-devel@lists.gforge.inria.fr>
-