+\subsubsection faq_SURF_dynamic Building a dynamic platform, with resource availability
+
+A nice feature of SimGrid is that it enables you to seamlessly have
+resources whose availability change over time. When you build a
+platform, you generally declare CPUs like that:
+
+\verbatim
+ <cpu name="Cpu A" power="100.00"/>
+\endverbatim
+
+If you want the availability of "CPU A" to change over time, the only
+thing you have to do is change this definition like that:
+
+\verbatim
+ <cpu name="Cpu A" power="100.00" availability_file="trace_A.txt" state_file="trace_A_failure.txt"/>
+\endverbatim
+
+For CPUs, availability files are expressed in fraction of available
+power. Let's have a look at what "trace_A.txt" may look like:
+
+\verbatim
+PERIODICITY 1.0
+0.0 1.0
+11.0 0.5
+20.0 0.9
+\endverbatim
+
+At time 0, our CPU will deliver 100 flop/s. At time 11.0, it will
+deliver only 50 flop/s until time 20.0 where it will will start
+delivering 90 flop/s. Last at time 21.0 (20.0 plus the periodicity
+1.0), we'll be back to the beginning and it will deliver 100 flop/s.
+
+Now let's look at the state file:
+\verbatim
+PERIODICITY 10.0
+1.0 -1.0
+2.0 1.0
+\endverbatim
+
+A negative value means "off" while a positive one means "on". At time
+1.0, the CPU is on. At time 1.0, it is turned off and at time 2.0, it
+is turned on again until time 12 (2.0 plus the periodicity 10.0). It
+will be turned on again at time 13.0 until time 23.0, and so on.
+
+Now, let's look how the same kind of thing can be done for network
+links. A usual declaration looks like:
+
+\verbatim
+ <network_link name="LinkA" bandwidth="10.0" latency="0.2"/>
+\endverbatim
+
+You have at your disposal the following options: bandwidth_file,
+latency_file and state_file. The only difference with CPUs is that
+bandwidth_file and latency_file do not express fraction of available
+power but are expressed directly in bytes per seconds and seconds.
+
+\subsubsection faq_platform_multipath Is it possible to have several paths between two given hosts?
+
+The answer is no, unfortunately. Let's consider the following platform
+file:
+
+\verbatim
+<route src="A" dst="B">
+ <route_element name="1"/>
+</route>
+<route src="B" dst="C">
+ <route_element name="2"/>
+</route>
+<route src="A" dst="C">
+ <route_element name="3"/>
+</route>
+\endverbatim
+
+Althrough it is perfectly valid, it does not mean that data traveling
+from A to C can either go directly (using link 3) or through B (using
+links 1 and 2). It simply means that the routing on the graph is not
+trivial, and that data do not following the shortest path in number of
+hops on this graph. Another way to say it is that there is no implicit
+in these routing descriptions. The system will only use the routes you
+declare (such as <route src="A" dst="C"><route_element
+name="3"/></route>), without trying to build new routes by aggregating
+the provided ones.
+
+You are also free to declare platform where the routing is not
+symetric. For example, add the following to the previous file:
+
+\verbatim
+<route src="C" dst="A">
+ <route_element name="2"/>
+ <route_element name="1"/>
+</route>
+\endverbatim
+
+This makes sure that data from C to A go through B where data from A
+to C go directly. Do not worry about realism of such settings since
+we've seen ways more weird situation in real settings.
+
+\subsubsection faq_flexml_bypassing Bypassing the XML parser with your own C functions
+
+So you want to bypass the XML files parser, uh? Maybe doin some parameter
+sweep experiments on your simulations or so? This is possible, but it's not
+really easy. Here is how it goes.
+
+For this, you have to first remember that the XML parsing in SimGrid is done
+using a tool called FleXML. Given a DTD, this gives a flex-based parser. If
+you want to bypass the parser, you need to provide some code mimicking what
+it does and replacing it in its interactions with the SURF code. So, let's
+have a look at these interactions.
+
+FleXML parser are close to classical SAX parsers. It means that a
+well-formed SimGrid platform XML file might result in the following
+"events":
+
+ - start "platform_description"
+ - start "cpu" with attributes name="host1" power="1.0"
+ - end "cpu"
+ - start "cpu" with attributes name="host2" power="2.0"
+ - end "cpu"
+ - start "network_link" with ...
+ - end "network_link"
+ - start "route" with ...
+ - end "route"
+ - start "route" with ...
+ - end "route"
+ - end "platform_description"
+
+The communication from the parser to the SURF code uses two means:
+Attributes get copied into some global variables, and a surf-provided
+function gets called by the parser for each event. For example, the event
+ - start "cpu" with attributes name="host1" power="1.0"
+
+let the parser do the equivalent of:
+\verbatim
+ strcpy("host1",A_cpu_name);
+ A_cpu_power = 1.0;
+ (*STag_cpu_fun)();
+\endverbatim
+
+In SURF, we attach callbacks to the different events by initializing the
+pointer functions to some the right surf functions. Example in
+workstation_KCCFLN05.c (surf_parse_open() ends up calling surf_parse()):
+\verbatim
+ // Building the routes
+ surf_parse_reset_parser();
+ STag_route_fun=parse_route_set_endpoints;
+ ETag_route_element_fun=parse_route_elem;
+ ETag_route_fun=parse_route_set_route;
+ surf_parse_open(file);
+ xbt_assert1((!surf_parse()),"Parse error in %s",file);
+ surf_parse_close();
+\endverbatim
+
+So, to bypass the FleXML parser, you need to write your own version of the
+surf_parse function, which should do the following:
+ - Call the corresponding STag_<tag>_fun function to simulate tag start
+ - Fill the A_<tag>_<attribute> variables with the wanted values
+ - Call the corresponding ETag_<tag>_fun function to simulate tag end
+ - (do the same for the next set of values, and loop)
+
+Then, tell SimGrid that you want to use your own "parser" instead of the stock one:
+\verbatim
+ surf_parse = surf_parse_bypass;
+ MSG_create_environment(NULL);
+\endverbatim
+
+An example of this trick is distributed in the file examples/msg/msg_test_surfxml_bypassed.c
+
+\section faq_troubleshooting Troubleshooting
+
+\subsection faq_trouble_lib_compil SimGrid compilation and installation problems
+
+\subsubsection faq_trouble_lib_config ./configure fails!
+
+We now only one reason for the configure to fail:
+
+ - <b>You are using a borken build environment</b>\n
+ If symptom is that configure complains about gcc not being able to build
+ executables, you are probably missing the libc6-dev package. Damn Ubuntu.
+
+If you experience other kind of issue, please get in touch with us. We are
+always interested in improving our portability to new systems.
+
+\subsubsection faq_trouble_distcheck Dude! "make check" fails on my machine!
+
+Don't assume we never run this target, because we do. Check
+http://bob.loria.fr:8010 if you don't believe us.
+
+There is several reasons which may cause the make check to fail on your
+machine:
+
+ - <b>You are using a borken libc (probably concerning the contextes)</b>.\n
+ The symptom is that the "make check" fails within the examples/msg directory.\n
+ By default, SimGrid uses something called ucontexts. This is part of the
+ libc, but it's quite undertested. For example, some (old) versions of the
+ glibc on alpha do not implement these functions, but provide the stubs
+ (which return ENOSYS: not implemented). It fools our detection mecanism
+ and leads to segfaults. There is not much we can do to fix the bug.
+ A workaround is to compile with --with-context=pthread to avoid
+ ucontext completely. You'll be a bit more limitated in the number
+ of simulated processes you can start concurently, but 5000
+ processes is still enough for most purposes, isn't it?\n
+ This limitation is the reason why we insist on using this piece of ...
+ software even if it's so troublesome.\n
+ <b>=> use --with-pthread on AMD64 architecture that do not have an
+ ultra-recent libc.</b>
+
+ - <b>There is a bug in SimGrid we aren't aware of</b>.\n
+ If none of the above apply, please drop us a mail on the mailing list so
+ that we can check it out. Make sure to read \ref faq_bugrepport
+ before you do so.
+
+\subsection faq_trouble_compil User code compilation problems
+
+\subsubsection faq_trouble_err_logcat "gcc: _simgrid_this_log_category_does_not_exist__??? undeclared (first use in this function)"
+
+This is because you are using the log mecanism, but you didn't created
+any default category in this file. You should refer to \ref XBT_log
+for all the details, but you simply forgot to call one of
+XBT_LOG_NEW_DEFAULT_CATEGORY() or XBT_LOG_NEW_DEFAULT_SUBCATEGORY().
+
+\subsubsection faq_trouble_pthreadstatic "gcc: undefinded reference to pthread_key_create"
+
+This indicates that one of the library SimGrid depends on (libpthread
+here) was missing on the linking command line. Dependencies of
+libsimgrid are expressed directly in the dynamic library, so it's
+quite impossible that you see this message when doing dynamic linking.
+
+If you compile your code statically (and if you use a pthread version
+of SimGrid -- see \ref faq_more_processes), you must absolutely
+specify <tt>-lpthread</tt> on the linker command line. As usual, this should
+come after <tt>-lsimgrid</tt> on this command line.
+
+\subsection faq_trouble_errors Runtime error messages
+
+\subsubsection faq_flexml_limit "surf_parse_lex: Assertion `next limit' failed."
+
+This is because your platform file is too big for the parser.
+
+Actually, the message comes directly from FleXML, the technology on top of
+which the parser is built. FleXML has the bad idea of fetching the whole
+document in memory before parsing it. And moreover, the memory buffer size
+must be determinded at compilation time.
+
+We use a value which seems big enough for our need without bloating the
+simulators footprints. But of course your mileage may vary. In this case,
+just edit src/surf/surfxml.l modify the definition of
+FLEXML_BUFFERSTACKSIZE. E.g.
+
+\verbatim
+#define FLEXML_BUFFERSTACKSIZE 1000000000
+\endverbatim
+
+Then recompile and everything should be fine, provided that your version of
+Flex is recent enough (>= 2.5.31). If not the compilation process should
+warn you.
+
+A while ago, we worked on FleXML to reduce a bit its memory consumtion, but
+these issues remain. There is two things we should do:
+
+ - use a dynamic buffer instead of a static one so that the only limit
+ becomes your memory, not a stupid constant fixed at compilation time
+ (maybe not so difficult).
+ - change the parser so that it does not need to get the whole file in
+ memory before parsing
+ (seems quite difficult, but I'm a complete newbe wrt flex stuff).
+
+These are changes to FleXML itself, not SimGrid. But since we kinda hijacked
+the development of FleXML, I can grant you that any patches would be really
+welcome and quickly integrated.
+
+<b>Update:</b> A new version of FleXML (1.7) was released. Most of the work
+was done by William Dowling, who use it in his own work. The good point is
+that it now use a dynamic buffer, and that the memory usage was greatly
+improved. The downside is that William also changed some things internally,
+and it breaks the hack we devised to bypass the parser, as explained in
+\ref faq_flexml_bypassing. Indeed, this is not a classical usage of the
+parser, and Will didn't imagine that we may have used (and even documented)
+such a crude usage of FleXML. So, we now have to repare the bypassing
+functionnality to use the lastest FleXML version and fix the memory usage in
+SimGrid.
+
+\subsubsection faq_trouble_gras_transport GRAS spits networking error messages
+
+Gras, on real platforms, naturally use regular sockets to communicate. They
+are deeply hiden in the gras abstraction, but when things go wrong, you may
+get some weird error messages. Here are some example, with the probable
+reason:
+
+ - <b>Transport endpoint is not connected</b>: several processes try to open
+ a server socket on the same port number of the same machine. This is
+ naturally bad and each process should pick its own port number for this.\n
+ Maybe, you just have some processes remaining from a previous experiment
+ on your machine.\n
+ Killing them may help, but again if you kill -KILL them, you'll have to
+ wait for a while: they didn't close there sockets properly and the system
+ needs a while to notice that this port is free again.
+
+ - <b>Socket closed by remote side</b>: if the remote process is not
+ supposed to close the socket at this point, it may be dead.
+
+ - <b>Connection reset by peer</b>: I found this on internet about this
+ error. I think it's what's happening here, too:\n
+ <i>This basically means that a network error occurred while the client was
+ receiving data from the server. But what is really happening is that the
+ server actually accepts the connection, processes the request, and sends
+ a reply to the client. However, when the server closes the socket, the
+ client believes that the connection has been terminated abnormally
+ because the socket implementation sends a TCP reset segment telling the
+ client to throw away the data and report an error.\n
+ Sometimes, this problem is caused by not properly closing the
+ input/output streams and the socket connection. Make sure you close the
+ input/output streams and socket connection properly. If everything is
+ closed properly, however, and the problem persists, you can work around
+ it by adding a one-second sleep before closing the streams and the
+ socket. This technique, however, is not reliable and may not work on all
+ systems.</i>\n
+ Since GRAS sockets are closed properly (repeat after me: there is no bug
+ in GRAS), it is either that you are closing your sockets on server side
+ before the client get a chance to read them (use gras_os_sleep() to delay
+ the server), or the server died awfully before the client got the data.
+
+\subsubsection faq_trouble_errors_big_fat_warning I'm told that my XML files are too old.
+
+We have decided to change the units in SimGrid. Now we use Bytes, Flops and
+seconds instead of MBytes, MFlops and seconds... Units should be updated
+accordingly and the version of platform_description should be set to a
+valuer greater than 1:
+\verbatim
+ <platform_description version="1">
+\endverbatim
+You should try to use the surfxml_update.pl script that can be found
+<a href="http://gforge.inria.fr/plugins/scmcvs/cvsweb.php/contrib/platform_generation/?cvsroot=cvsroot%2Fsimgrid">here</a>.
+
+\subsection faq_trouble_valgrind Valgrind-related issues
+
+If you don't, you really should use valgrind to debug your code, it's
+almost magic.
+
+\subsubsection faq_trouble_vg_longjmp longjmp madness in valgrind
+
+This is when valgrind starts complaining about longjmp things, just like:
+
+\verbatim ==21434== Conditional jump or move depends on uninitialised value(s)
+==21434== at 0x420DBE5: longjmp (longjmp.c:33)
+==21434==
+==21434== Use of uninitialised value of size 4
+==21434== at 0x420DC3A: __longjmp (__longjmp.S:48)
+\endverbatim
+
+or even when it reports scary things like:
+
+\verbatim ==24023== Warning: client switching stacks? SP change: 0xBE3FF618 --> 0xBE7FF710
+x86->IR: unhandled instruction bytes: 0xF4 0xC7 0x83 0xD0
+==24023== to suppress, use: --max-stackframe=4194552 or greater
+==24023== Your program just tried to execute an instruction that Valgrind
+==24023== did not recognise. There are two possible reasons for this.
+==24023== 1. Your program has a bug and erroneously jumped to a non-code
+==24023== location. If you are running Memcheck and you just saw a
+==24023== warning about a bad jump, it's probably your program's fault.
+==24023== 2. The instruction is legitimate but Valgrind doesn't handle it,
+==24023== i.e. it's Valgrind's fault. If you think this is the case or
+==24023== you are not sure, please let us know.
+==24023== Either way, Valgrind will now raise a SIGILL signal which will
+==24023== probably kill your program.
+==24023==
+==24023== Process terminating with default action of signal 4 (SIGILL)
+==24023== Illegal opcode at address 0x420D234
+==24023== at 0x420D234: abort (abort.c:124)
+\endverbatim
+
+This is the sign that you didn't used the exception mecanism well. Most
+probably, you have a <tt>return;</tt> somewhere within a <tt>TRY{}</tt>
+block. This is <b>evil</b>, and you must not do this. Did you read the section
+about \ref XBT_ex??
+
+\subsubsection faq_trouble_vg_libc Valgrind spits tons of errors about backtraces!
+
+It may happen that valgrind, the memory debugger beloved by any decent C
+programmer, spits tons of warnings like the following :
+\verbatim ==8414== Conditional jump or move depends on uninitialised value(s)
+==8414== at 0x400882D: (within /lib/ld-2.3.6.so)
+==8414== by 0x414EDE9: (within /lib/tls/i686/cmov/libc-2.3.6.so)
+==8414== by 0x400B105: (within /lib/ld-2.3.6.so)
+==8414== by 0x414F937: _dl_open (in /lib/tls/i686/cmov/libc-2.3.6.so)
+==8414== by 0x4150F4C: (within /lib/tls/i686/cmov/libc-2.3.6.so)
+==8414== by 0x400B105: (within /lib/ld-2.3.6.so)
+==8414== by 0x415102D: __libc_dlopen_mode (in /lib/tls/i686/cmov/libc-2.3.6.so)
+==8414== by 0x412D6B9: backtrace (in /lib/tls/i686/cmov/libc-2.3.6.so)
+==8414== by 0x8076446: xbt_dictelm_get_ext (dict_elm.c:714)
+==8414== by 0x80764C1: xbt_dictelm_get (dict_elm.c:732)
+==8414== by 0x8079010: xbt_cfg_register (config.c:208)
+==8414== by 0x806821B: MSG_config (msg_config.c:42)
+\endverbatim
+
+This problem is somewhere in the libc when using the backtraces and there is
+very few things we can do ourselves to fix it. Instead, here is how to tell
+valgrind to ignore the error. Add the following to your ~/.valgrind.supp (or
+create this file on need). Make sure to change the obj line according to
+your personnal mileage (change 2.3.6 to the actual version you are using,
+which you can retrieve with a simple "ls /lib/ld*.so").
+
+\verbatim {
+ name: Backtrace madness
+ Memcheck:Cond
+ obj:/lib/ld-2.3.6.so
+ fun:dl_open_worker
+ fun:_dl_open
+ fun:do_dlopen
+ fun:dlerror_run
+ fun:__libc_dlopen_mode
+}\endverbatim
+
+Then, you have to specify valgrind to use this suppression file by passing
+the <tt>--suppressions=$HOME/.valgrind.supp</tt> option on the command line.
+You can also add the following to your ~/.bashrc so that it gets passed
+automatically. Actually, it passes a bit more options to valgrind, and this
+happen to be my personnal settings. Check the valgrind documentation for
+more information.
+
+\verbatim export VALGRIND_OPTS="--leak-check=yes --leak-resolution=high --num-callers=40 --tool=memcheck --suppressions=$HOME/.valgrind.supp" \endverbatim
+
+\subsection faq_deadlock There is a deadlock in my code!!!
+
+Unfortunately, we cannot debug every code written in SimGrid. We
+furthermore believe that the framework provides ways enough
+information to debug such informations yourself. If the textual output
+is not enough, Make sure to check the \ref faq_visualization FAQ entry to see
+how to get a graphical one.
+
+Now, if you come up with a really simple example that deadlocks and
+you're absolutely convinced that it should not, you can ask on the
+list. Just be aware that you'll be severely punished if the mistake is
+on your side... We have plenty of FAQ entries to redact and new
+features to implement for the impenitents! ;)
+
+\subsection faq_surf_network_latency I get weird timings when I play with the latencies.
+
+OK, first of all, remember that units should be Bytes, Flops and
+Seconds. If you don't use such units, some SimGrid constants (e.g. the
+SG_TCP_CTE_GAMMA constant used in most network models) won't have the
+right unit and you'll end up with weird results.
+
+Here is what happens with a single transfer of size L on a link
+(bw,lat) when nothing else happens.
+
+\verbatim
+0-----lat--------------------------------------------------t
+|-----|**** real_bw =min(bw,SG_TCP_CTE_GAMMA/(2*lat)) *****|
+\endverbatim
+
+In more complex situations, this min is the solution of a complex
+max-min linear system. Have a look
+<a href="http://lists.gforge.inria.fr/pipermail/simgrid-devel/2006-April/thread.html">here</a>
+and read the two threads "Bug in SURF?" and "Surf bug not
+fixed?". You'll have a few other examples of such computations. You
+can also read "A Network Model for Simulation of Grid Application" by
+Henri Casanova and Loris Marchal to have all the details. The fact
+that the real_bw is smaller than bw is easy to understand. The fact
+that real_bw is smaller than SG_TCP_CTE_GAMMA/(2*lat) is due to the
+window-based congestion mechanism of TCP. With TCP, you can't exploit
+your huge network capacity if you don't have a good round-trip-time
+because of the acks...
+
+Anyway, what you get is t=lat + L/min(bw,SG_TCP_CTE_GAMMA/(2*lat)).
+
+ * if I you set (bw,lat)=(100 000 000, 0.00001), you get t = 1.00001 (you fully
+use your link)
+ * if I you set (bw,lat)=(100 000 000, 0.0001), you get t = 1.0001 (you're on the
+limit)
+ * if I you set (bw,lat)=(100 000 000, 0.001), you get t = 10.001 (ouch!)
+
+This bound on the effective bandwidth of a flow is not the only thing
+that may make your result be unexpected. For example, two flows
+competing on a saturated link receive an amount of bandwidth inversely
+proportional to their round trip time.
+
+\subsection faq_bugrepport So I've found a bug in SimGrid. How to report it?
+
+We do our best to make sure to hammer away any bugs of SimGrid, but this is
+still an academic project so please be patient if/when you find bugs in it.
+If you do, the best solution is to drop an email either on the simgrid-user
+or the simgrid-devel mailing list and explain us about the issue. You can
+also decide to open a formal bug report using the
+<a href="https://gforge.inria.fr/tracker/?atid=165&group_id=12&func=browse">relevant
+interface</a>. You need to login on the server to get the ability to submit
+bugs.
+
+We will do our best to solve any problem repported, but you need to help us
+finding the issue. Just telling "it segfault" isn't enough. Telling "It
+segfaults when running the attached simulator" doesn't really help either.
+You may find the following article interesting to see how to repport
+informative bug repports:
+http://www.chiark.greenend.org.uk/~sgtatham/bugs.html (it is not SimGrid
+specific at all, but it's full of good advices).
+
+\author Arnaud Legrand (arnaud.legrand::imag.fr)
+\author Martin Quinson (martin.quinson::loria.fr)
+
+
+*/