/*! \page faq Frequently Asked Questions
-\author Arnaud Legrand <arnaud.legrand#imag.fr>
-
\section faq_installation Installing the SimGrid library
Many people have been asking me questions on how to use SimGrid. Quite
\section faq_simgrid I'm new to SimGrid. I have some questions. Where should I start ?
-You are at the right place... Having a look to these <a
-href="http://graal.ens-lyon.fr/~alegrand/articles/Simgrid-Introduction.pdf">slides</a>
+You are at the right place... Having a look to these
+<a href="http://graal.ens-lyon.fr/~alegrand/articles/Simgrid-Introduction.pdf">slides</a>
may give you some insights on what SimGrid can help you to do and what
are its limitations. Then you definitely should read the \ref
-MSG_examples. There is also a mailing list: <simgrid-user#lists.gforge.inria.fr>.
+MSG_examples. There is also a mailing list: <simgrid-user@lists.gforge.inria.fr>.
\subsection faq_generic Building a generic simulator
Please read carefully the \ref MSG_examples. You'll find in \ref
-msg_test.c a very simple consisting of a master (that owns a bunch of
+MSG_ex_master_slave a very simple consisting of a master (that owns a bunch of
tasks and distributes them) , some slaves (that process tasks whenever
they receive one) and some forwarder agents (that simply pass the
tasks they receive to some slaves).
\subsection faq_platform Building a realistic platform
I can speak more than an hour on this subject and I still do not have
-the right answer, just some ideas. You can read the following <a
-href="http://graal.ens-lyon.fr/~alegrand/articles/Simgrid-Introduction.pdf">slides</a>.
+the right answer, just some ideas. You can read the following
+<a href="http://graal.ens-lyon.fr/~alegrand/articles/Simgrid-Introduction.pdf">slides</a>.
It may give you some hints. You can also have a look at the
<tt>tools/platform_generation/</tt> directory. There is a perl-script
I use to annotate a Tiers generated platform (may not be up-to-date
\endhtmlonly
</center>
-\subsection faq_context I have tons of process and it is limiting my simulation.
+\subsection faq_context_1000 I want thousands of simulated processes
+
+SimGrid can use either pthreads library or the UNIX98 contextes. On most
+systems, the number of pthreads is limited and then your simulation may be
+limited for a stupid reason. This is especially true with the current linux
+pthreads, and I cannot get more than 2000 simulated processes with pthreads
+on my box. The UNIX98 contextes allow me to raise the limit to 25,000
+simulated processes on my laptop.
+
+The <tt>--with-context</tt> option of the <tt>./configure</tt> script allows
+you to choose between UNIX98 contextes (<tt>--with-context=ucontext</tt>)
+and the pthread version ( (<tt>--with-context=pthread</tt>). The default
+value is ucontext when the script detect a working UNIX98 context
+implementation. On Windows boxes, the provided value is discarded and an
+adapted version is picked up.
+
+We experienced some issues with contextes on some rare systems (solaris 8
+and lower comes to mind). The main problem is that the configure script
+detect the contextes as being functional when it's not true. If you happen
+to use such a system, switch manually to the pthread version, and provide us
+with a good patch for the configure script so that it is done automatically ;)
+
+\subsection faq_context_10000 I want hundred thousands of simulated processes
+
+As explained above, SimGrid can use UNIX98 contextes to represent and handle
+the simulated processes. Thanks to this, the main limitation to the number
+of simulated processes becomes the available memory.
+
+Here are some tricks I had to use in order to run a token ring between
+25,000 processes on my laptop (1Gb memory, 1.5Gb swap).
+
+ - First of all, make sure your code runs for a few hundreds processes
+ before trying to push the limit. Make sure it's valgrind-clean, ie that
+ valgrind does not report neither memory error nor memory leaks. Indeed,
+ numerous simulated processes result in *fat* simulation hindering debugging.
-MSG can use either pthreads or the GNU context library. On most
-systems, the number of pthreads is limited (especially with the
-current linux pthreads) and then your simulation may be limited for a
-stupid reason. If you enable the context option
-(<tt>--enable-context</tt> in the <tt>./configure</tt> phase), you
-will not use the pthread anymore and the context switching will be
-done manually, which enables us to have as much agents as your memory
-can hold and should be much faster... So think about it if your
-simulation is getting really big.
+ - It was really boring to write 25,000 entries in the deployment file, so I wrote
+ a little script <tt>examples/gras/tokenS/make_deployment.pl</tt>, which you may
+ want to adapt to your case.
+
+ - The deployment file became quite big, so I had to do what is in the FAQ
+ entry \ref faq_flexml_limit
+
+ - Each UNIX98 context has its own stack entry. As debugging this is quite
+ hairly, the default value is a bit overestimated so that user don't get
+ into trouble about this. You want to tune this size to increse the number
+ of processes. This is the <tt>STACK_SIZE</tt> define in
+ <tt>src/xbt/context_private.h</tt>, which is 128kb by default.
+ Reduce this as much as you can, but be warned that if this value is too
+ low, you'll get a segfault. The token ring example, which is quite simple,
+ runs with 40kb stacks.
-Nevertheless, be aware that this code does not work on some system. It
-is not very clean. As usual, as soon as I will have a little bit more
-time, I will recode it in a cleaner way.
\section faq_SG Where has SG disappeared ?!?
to gforge so people that are interested by helping on this part will
have the possibility to do it.
-\subsection But I wanted to implement a distributed dynamic scheduler of DAGs... How can I do that it SG is not available anymore in the next versions ?
+\subsection faq_SG_DAG But I wanted to implement a distributed dynamic scheduler of DAGs... How can I do that it SG is not available anymore in the next versions ?
Distributed is somehow "contagious". If you start making distributed
decisions, there is no way to handle DAGs directly anymore (unless I am
Note however that SURF will be slower than the old SG to handle traces with
a lots of variations (there is no trace integration anymore).
-*/
\ No newline at end of file
+
+\subsection faq_SURF_dynami How can I have variable resource availability?
+
+A nice feature of SimGrid is that it enables you to seamlessly have
+resources whose availability change over time. When you build a
+platform, you generally declare CPUs like that:
+
+\verbatim
+ <cpu name="Cpu A" power="100.00"/>
+\endverbatim
+
+If you want the availability of "CPU A" to change over time, the only
+thing you have to do is change this definition like that:
+
+\verbatim
+ <cpu name="Cpu A" power="100.00" availability_file="trace_A.txt" state_file="trace_A_failure.txt"/>
+\endverbatim
+
+For CPUs, availability files are expressed in fraction of available
+power. Let's have a look at what "trace_A.txt" may look like:
+
+\verbatim
+PERIODICITY 1.0
+0.0 1.0
+11.0 0.5
+20.0 0.9
+\endverbatim
+
+At time 0, our CPU will deliver 100 Mflop/s. At time 11.0, it will
+deliver only 50 Mflop/s until time 20.0 where it will will start
+delivering 90 Mflop/s. Last at time 21.0 (20.0 plus the periodicity
+1.0), we'll be back to the beginning and it will deliver 100Mflop/s.
+
+Now let's look at the state file:
+\verbatim
+PERIODICITY 10.0
+1.0 -1.0
+2.0 1.0
+\endverbatim
+
+A negative value means "off" while a positive one means "on". At time
+1.0, the CPU is on. At time 1.0, it is turned off and at time 2.0, it
+is turned on again until time 12 (2.0 plus the periodicity 10.0). It
+will be turned on again at time 13.0 until time 23.0, and so on.
+
+Now, let's look how the same kind of thing can be done for network
+links. A usual declaration looks like:
+
+\verbatim
+ <network_link name="LinkA" bandwidth="10.0" latency="0.2"/>
+\endverbatim
+
+You have at your disposal the following options: bandwidth_file,
+latency_file and state_file. The only difference with CPUs is that
+bandwidth_file and latency_file do not express fraction of available
+power but are expressed directly in Mb/s and seconds.
+
+\section faq_flexml_bypassing How could I have some C functions do what the platform and deployment files do?
+
+So you want to bypass the XML files parser, uh? Maybe doin some parameter
+sweep experiments on your simulations or so? This is possible, but it's not
+really easy. Here is how it goes.
+
+For this, you have to first remember that the XML parsing in SimGrid is done
+using a tool called FleXML. Given a DTD, this gives a flex-based parser. If
+you want to bypass the parser, you need to provide some code mimicking what
+it does and replacing it in its interactions with the SURF code. So, let's
+have a look at these interactions.
+
+FleXML parser are close to classical SAX parsers. It means that a
+well-formed SimGrid platform XML file might result in the following
+"events":
+
+ - start "platform_description"
+ - start "cpu" with attributes name="host1" power="1.0"
+ - end "cpu"
+ - start "cpu" with attributes name="host2" power="2.0"
+ - end "cpu"
+ - start "network_link" with ...
+ - end "network_link"
+ - start "route" with ...
+ - end "route"
+ - start "route" with ...
+ - end "route"
+ - end "platform_description"
+
+The communication from the parser to the SURF code uses two means:
+Attributes get copied into some global variables, and a surf-provided
+function gets called by the parser for each event. For example, the event
+ - start "cpu" with attributes name="host1" power="1.0"
+
+let the parser do the equivalent of:
+\verbatim
+ strcpy("host1",A_cpu_name);
+ A_cpu_power = 1.0;
+ (*STag_cpu_fun)();
+\endverbatim
+
+In SURF, we attach callbacks to the different events by initializing the
+pointer functions to some the right surf functions. Example in
+workstation_KCCFLN05.c (surf_parse_open() ends up calling surf_parse()):
+\verbatim
+ // Building the routes
+ surf_parse_reset_parser();
+ STag_route_fun=parse_route_set_endpoints;
+ ETag_route_element_fun=parse_route_elem;
+ ETag_route_fun=parse_route_set_route;
+ surf_parse_open(file);
+ xbt_assert1((!surf_parse()),"Parse error in %s",file);
+ surf_parse_close();
+\endverbatim
+
+So, to bypass the FleXML parser, you need to write your own version of the
+surf_parse function, which should do the following:
+ - Call the corresponding STag_<tag>_fun function to simulate tag start
+ - Fill the A_<tag>_<attribute> variables with the wanted values
+ - Call the corresponding ETag_<tag>_fun function to simulate tag end
+ - (do the same for the next set of values, and loop)
+
+Then, tell SimGrid that you want to use your own "parser" instead of the stock one:
+\verbatim
+ surf_parse = surf_parse_bypass;
+ MSG_create_environment(NULL);
+\endverbatim
+
+An example of this trick is distributed in the file examples/msg/msg_test_surfxml_bypassed.c
+
+\section faq_flexml_limit I get the message "surf_parse_lex: Assertion `next<limit' failed."
+
+This is because your platform file is too big for the parser.
+
+Actually, the message comes directly from FleXML, the technology on top of
+which the parser is built. FleXML has the bad idea of fetching the whole
+document in memory before parsing it. And moreover, the memory buffer size
+must be determinded at compilation time.
+
+We use a value which seems big enough for our need withour bloating the
+simulators footprints. But of course your mileage may vary. In this case,
+just edit src/surf/surfxml.l modify the definition of
+FLEXML_BUFFERSTACKSIZE. E.g.
+
+\verbatim
+#define FLEXML_BUFFERSTACKSIZE 1000000000
+\endverbatim
+
+Then recompile and everything should be fine, provided that your version of
+Flex is recent enough (>= 2.5.31). If not the compilation process should
+warn you.
+
+A while ago, we worked on FleXML to reduce a bit its memory consumtion, but
+these issues remain. There is two things we should do:
+
+ - use a dynamic buffer instead of a static one so that the only limit
+ becomes your memory, not a stupid constant fixed at compilation time
+ (maybe not so difficult).
+ - change the parser so that it does not need to get the whole file in
+ memory before parsing
+ (seems quite difficult, but I'm a complete newbe wrt flex stuff).
+
+These are changes to FleXML itself, not SimGrid. But since we kinda hijacked
+the development of FleXML, I can grant you that any patches would be really
+welcome and quickly integrated.
+
+\section faq_host_load Where is the get_host_load function hidden in MSG?
+
+There is no such thing because its semantic wouldn't be really clear. Of
+course, it is something about the amount of host throughput, but there is as
+many definition of "host load" as people asking for this function.
+
+It may be instantaneous value or an average one. Moreover it may be only the
+power of the computer, or may take the background load into account, or may
+even take the currently running tasks into account. In some SURF models,
+communications have an influence on computational power. Should it be taken
+into account too?
+
+So, we decided not to include such a function into MSG and let people do it
+thereselves so that they get the value matching exactly what they mean. One
+possibility is to run active measurement as in next code snippet. It is very
+close from what you would have to do out of the simulator, and thus gives
+you information that you could also get in real settings to not hinder the
+realism of your simulation.
+
+\verbatim
+double get_host_load() {
+ m_task_t task = MSG_task_create("test", 0.001, 0, NULL);
+ double date = MSG_get_clock();
+
+ MSG_task_execute(task);
+ date = MSG_get_clock() - date;
+ MSG_task_destroy(task);
+ return (0.001/date);
+}
+\endverbatim
+
+Of course, it may not match your personal definition of "host load". In this
+case, please detail what you mean on the mailing list, and we will extend
+this FAQ section to fit your taste if possible.
+
+
+\author Arnaud Legrand (arnaud.legran::imag.fr)
+\author Martin Quinson (martin.quinson::loria.fr)
+
+
+*/
+