+@code{.unparsed}
+ while(!flag) {
+ MPI_Test(request, flag, status);
+ ...
+ }
+@endcode
+
+@note
+ Internally, in order to speed up execution, we use a counter to keep track
+ on how often we already checked if the handle is now valid or not. Hence, we
+ actually use counter*SLEEP_TIME, that is, the time MPI_Test() causes the process
+ to sleep increases linearly with the number of previously failed tests. This
+ behavior can be disabled by setting smpi/grow-injected-times to no. This will
+ also disable this behavior for MPI_Iprobe.
+
+
+@subsection options_model_smpi_shared_malloc smpi/shared-malloc: Factorize malloc()s
+
+@b Default: global
+
+If your simulation consumes too much memory, you may want to modify
+your code so that the working areas are shared by all MPI ranks. For
+example, in a bloc-cyclic matrix multiplication, you will only
+allocate one set of blocs, and every processes will share them.
+Naturally, this will lead to very wrong results, but this will save a
+lot of memory so this is still desirable for some studies. For more on
+the motivation for that feature, please refer to the
+<a href="https://simgrid.github.io/SMPI_CourseWare/topic_understanding_performance/matrixmultiplication/">relevant
+section</a> of the SMPI CourseWare (see Activity #2.2 of the pointed
+assignment). In practice, change the call to malloc() and free() into
+SMPI_SHARED_MALLOC() and SMPI_SHARED_FREE().
+
+SMPI provides 2 algorithms for this feature. The first one, called @c
+local, allocates one bloc per call to SMPI_SHARED_MALLOC() in your
+code (each call location gets its own bloc) and this bloc is shared
+amongst all MPI ranks. This is implemented with the shm_* functions
+to create a new POSIX shared memory object (kept in RAM, in /dev/shm)
+for each shared bloc.
+
+With the @c global algorithm, each call to SMPI_SHARED_MALLOC()
+returns a new adress, but it only points to a shadow bloc: its memory
+area is mapped on a 1MiB file on disk. If the returned bloc is of size
+N MiB, then the same file is mapped N times to cover the whole bloc.
+At the end, no matter how many SMPI_SHARED_MALLOC you do, this will
+only consume 1 MiB in memory.
+
+You can disable this behavior and come back to regular mallocs (for
+example for debugging purposes) using @c "no" as a value.
+
+If you want to keep private some parts of the buffer, for instance if these
+parts are used by the application logic and should not be corrupted, you
+can use SMPI_PARTIAL_SHARED_MALLOC(size, offsets, offsets_count).
+
+As an example,
+
+@code{.C}
+ mem = SMPI_PARTIAL_SHARED_MALLOC(500, {27,42 , 100,200}, 2);
+@endcode
+
+will allocate 500 bytes to mem, such that mem[27..41] and mem[100..199]
+are shared and other area remain private.
+
+Then, it can be deallocated by calling SMPI_SHARED_FREE(mem).
+
+When smpi/shared-malloc:global is used, the memory consumption problem
+is solved, but it may induce too much load on the kernel's pages table.
+In this case, you should use huge pages so that we create only one
+entry per Mb of malloced data instead of one entry per 4k.
+To activate this, you must mount a hugetlbfs on your system and allocate
+at least one huge page:
+
+@code{.sh}
+ mkdir /home/huge
+ sudo mount none /home/huge -t hugetlbfs -o rw,mode=0777
+ sudo sh -c 'echo 1 > /proc/sys/vm/nr_hugepages' # echo more if you need more
+@endcode
+
+Then, you can pass the option --cfg=smpi/shared-malloc-hugepage:/home/huge
+to smpirun to actually activate the huge page support in shared mallocs.
+
+@subsection options_model_smpi_wtime smpi/wtime: Inject constant times for calls to MPI_Wtime, gettimeofday and clock_gettime
+
+@b Default value: 10 ns
+
+This option controls the amount of (simulated) time spent in calls to
+MPI_Wtime(), gettimeofday() and clock_gettime(). If you set this value
+to 0, the simulated clock is not advanced in these calls, which leads
+to issue if your application contains such a loop:
+
+@code{.unparsed}
+ while(MPI_Wtime() < some_time_bound) {
+ /* some tests, with no communication nor computation */
+ }
+@endcode
+
+When the option smpi/wtime is set to 0, the time advances only on
+communications and computations, so the previous code results in an
+infinite loop: the current [simulated] time will never reach @c
+some_time_bound. This infinite loop is avoided when that option is
+set to a small amount, as it is by default since SimGrid v3.21.
+
+Note that if your application does not contain any loop depending on
+the current time only, then setting this option to a non-zero value
+will slow down your simulations by a tiny bit: the simulation loop has
+to be broken and reset each time your code ask for the current time.
+If the simulation speed really matters to you, you can avoid this
+extra delay by setting smpi/wtime to 0.
+
+@section options_generic Configuring other aspects of SimGrid
+
+@subsection options_generic_clean_atexit Cleanup before termination
+
+The C / C++ standard contains a function called @b [atexit](http://www.cplusplus.com/reference/cstdlib/atexit/).
+atexit registers callbacks, which are called just before the program terminates.
+
+By setting the configuration option clean-atexit to 1 (true), a callback
+is registered and will clean up some variables and terminate/cleanup the tracing.
+
+TODO: Add when this should be used.
+
+@subsection options_generic_path Profile files' search path