+\subsection options_smpi_temps smpi/keep-temps: not cleaning up after simulation
+
+\b Default: 0 (false)
+
+Under some conditions, SMPI generates a lot of temporary files. They
+usually get cleaned, but you may use this option to not erase these
+files. This is for example useful when debugging or profiling
+executions using the dlopen privatization schema, as missing binary
+files tend to fool the debuggers.
+
+\subsection options_model_smpi_lat_factor smpi/lat-factor: Latency factors
+
+The motivation and syntax for this option is identical to the motivation/syntax
+of smpi/bw-factor, see \ref options_model_smpi_bw_factor for details.
+
+There is an important difference, though: While smpi/bw-factor \a reduces the
+actual bandwidth (i.e., values between 0 and 1 are valid), latency factors
+increase the latency, i.e., values larger than or equal to 1 are valid here.
+
+This is the default value:
+
+\verbatim
+65472:11.6436;15424:3.48845;9376:2.59299;5776:2.18796;3484:1.88101;1426:1.61075;732:1.9503;257:1.95341;0:2.01467
+\endverbatim
+
+\note
+ The SimGrid-Team has developed a script to help you determine these
+ values. You can find more information and the download here:
+ 1. http://simgrid.gforge.inria.fr/contrib/smpi-calibration-doc.html
+ 2. http://simgrid.gforge.inria.fr/contrib/smpi-saturation-doc.html
+
+\subsection options_smpi_papi_events smpi/papi-events: Trace hardware counters with PAPI
+
+\warning
+ This option is experimental and will be subject to change.
+ This feature currently requires superuser privileges, as registers are queried.
+ Only use this feature with code you trust! Call smpirun for instance via
+ smpirun -wrapper "sudo " <your-parameters>
+ or run sudo sh -c "echo 0 > /proc/sys/kernel/perf_event_paranoid"
+ In the later case, sudo will not be required.
+
+\note
+ This option is only available when SimGrid was compiled with PAPI support.
+
+This option takes the names of PAPI counters and adds their respective values
+to the trace files. (See Section \ref tracing_tracing_options.)
+
+It is planned to make this feature available on a per-process (or per-thread?) basis.
+The first draft, however, just implements a "global" (i.e., for all processes) set
+of counters, the "default" set.
+
+\verbatim
+--cfg=smpi/papi-events:"default:PAPI_L3_LDM:PAPI_L2_LDM"
+\endverbatim
+
+\subsection options_smpi_privatization smpi/privatization: Automatic privatization of global variables
+
+MPI executables are usually meant to be executed in separated processes, but SMPI is
+executed in only one process. Global variables from executables will be placed
+in the same memory zone and shared between processes, causing intricate bugs.
+Several options are possible to avoid this, as described in the main
+<a href="https://hal.inria.fr/hal-01415484">SMPI publication</a>.
+SimGrid provides two ways of automatically privatizing the globals,
+and this option allows to choose between them.
+
+ - <b>no</b> (default): Do not automatically privatize variables.
+ - <b>mmap</b> or <b>yes</b>: Runtime automatic switching of the data segments.\n
+ SMPI stores a copy of each global data segment for each process,
+ and at each context switch replaces the actual data with its copy
+ from the right process. No copy actually occures as this mechanism
+ uses mmap for efficiency. As such, it is for now limited to
+ systems supporting this functionnality (all Linux and most BSD).\n
+ Another limitation is that SMPI only accounts for global variables
+ defined in the executable. If the processes use external global
+ variables from dynamic libraries, they won't be switched
+ correctly. The easiest way to solve this is to statically link
+ against the library with these globals (but you should never
+ statically link against the simgrid library itself).
+ - <b>dlopen</b>: Link multiple times against the binary.\n
+ SMPI loads several copy of the same binary in memory, resulting in
+ the natural duplication global variables. Since the dynamic linker
+ refuses to link the same file several times, the binary is copied
+ in a temporary file before being dl-loaded (it is erased right
+ after loading).\n
+ Note that this feature is somewhat experimental at time of writing
+ (v3.16) but seems to work.\n
+ This approach greatly speeds up the context switching, down to
+ about 40 CPU cycles with our raw contextes, instead of requesting
+ several syscalls with the \c mmap approach. Another advantage is
+ that it permits to run the SMPI contexts in parallel, which is
+ obviously not possible with the \c mmap approach.\n
+ Further work may be possible to alleviate the memory and disk
+ overconsumption. It seems that we could
+ <a href="https://lwn.net/Articles/415889/">punch holes</a>
+ in the files before dl-loading them to remove the code and
+ constants, and mmap these area onto a unique copy. This require
+ to understand the ELF layout of the file, but would
+ reduce the disk- and memory- usage to the bare minimum. In
+ addition, this would reduce the pressure on the CPU caches (in
+ particular on instruction one).
+
+\warning
+ This configuration option cannot be set in your platform file. You can only
+ pass it as an argument to smpirun.
+
+\subsection options_model_smpi_detached Simulating MPI detached send
+
+This threshold specifies the size in bytes under which the send will return
+immediately. This is different from the threshold detailed in \ref options_model_network_asyncsend
+because the message is not effectively sent when the send is posted. SMPI still waits for the
+correspondant receive to be posted to perform the communication operation. This threshold can be set
+by changing the \b smpi/send-is-detached-thresh item. The default value is 65536.
+
+\subsection options_model_smpi_collectives Simulating MPI collective algorithms
+
+SMPI implements more than 100 different algorithms for MPI collective communication, to accurately
+simulate the behavior of most of the existing MPI libraries. The \b smpi/coll-selector item can be used
+ to use the decision logic of either OpenMPI or MPICH libraries (values: ompi or mpich, by default SMPI
+uses naive version of collective operations). Each collective operation can be manually selected with a
+\b smpi/collective_name:algo_name. Available algorithms are listed in \ref SMPI_use_colls .
+
+\subsection options_model_smpi_iprobe smpi/iprobe: Inject constant times for calls to MPI_Iprobe
+
+\b Default value: 0.0001
+
+The behavior and motivation for this configuration option is identical with \a smpi/test, see
+Section \ref options_model_smpi_test for details.
+
+\subsection options_model_smpi_iprobe_cpu_usage smpi/iprobe-cpu-usage: Reduce speed for iprobe calls
+
+\b Default value: 1 (no change from default behavior)
+
+MPI_Iprobe calls can be heavily used in applications. To account correctly for the energy
+cores spend probing, it is necessary to reduce the load that these calls cause inside
+SimGrid.
+
+For instance, we measured a max power consumption of 220 W for a particular application but
+only 180 W while this application was probing. Hence, the correct factor that should
+be passed to this option would be 180/220 = 0.81.
+
+\subsection options_model_smpi_init smpi/init: Inject constant times for calls to MPI_Init
+
+\b Default value: 0
+
+The behavior for this configuration option is identical with \a smpi/test, see
+Section \ref options_model_smpi_test for details.
+
+\subsection options_model_smpi_ois smpi/ois: Inject constant times for asynchronous send operations
+
+This configuration option works exactly as \a smpi/os, see Section \ref options_model_smpi_os.
+Of course, \a smpi/ois is used to account for MPI_Isend instead of MPI_Send.
+
+\subsection options_model_smpi_os smpi/os: Inject constant times for send operations
+
+In several network models such as LogP, send (MPI_Send, MPI_Isend) and receive (MPI_Recv)
+operations incur costs (i.e., they consume CPU time). SMPI can factor these costs in as well, but the
+user has to configure SMPI accordingly as these values may vary by machine.
+This can be done by using smpi/os for MPI_Send operations; for MPI_Isend and
+MPI_Recv, use \a smpi/ois and \a smpi/or, respectively. These work exactly as
+\a smpi/ois.
+
+\a smpi/os can consist of multiple sections; each section takes three values, for example:
+
+\verbatim
+ 1:3:2;10:5:1
+\endverbatim
+
+Here, the sections are divided by ";" (that is, this example contains two sections).
+Furthermore, each section consists of three values.
+
+1. The first value denotes the minimum size for this section to take effect;
+ read it as "if message size is greater than this value (and other section has a larger
+ first value that is also smaller than the message size), use this".
+ In the first section above, this value is "1".
+
+2. The second value is the startup time; this is a constant value that will always
+ be charged, no matter what the size of the message. In the first section above,
+ this value is "3".
+
+3. The third value is the \a per-byte cost. That is, it is charged for every
+ byte of the message (incurring cost messageSize*cost_per_byte)
+ and hence accounts also for larger messages. In the first
+ section of the example above, this value is "2".
+
+Now, SMPI always checks which section it should take for a given message; that is,
+if a message of size 11 is sent with the configuration of the example above, only
+the second section will be used, not the first, as the first value of the second
+section is closer to the message size. Hence, a message of size 11 incurs the
+following cost inside MPI_Send:
+
+\verbatim
+ 5+11*1
+\endverbatim
+
+As 5 is the startup cost and 1 is the cost per byte.
+
+\note
+ The order of sections can be arbitrary; they will be ordered internally.
+
+\subsection options_model_smpi_or smpi/or: Inject constant times for receive operations
+
+This configuration option works exactly as \a smpi/os, see Section \ref options_model_smpi_os.
+Of course, \a smpi/or is used to account for MPI_Recv instead of MPI_Send.
+
+\subsection options_model_smpi_test smpi/test: Inject constant times for calls to MPI_Test
+
+\b Default value: 0.0001
+
+By setting this option, you can control the amount of time a process sleeps
+when MPI_Test() is called; this is important, because SimGrid normally only
+advances the time while communication is happening and thus,
+MPI_Test will not add to the time, resulting in a deadlock if used as a
+break-condition.
+
+Here is an example:
+
+\code{.unparsed}
+ while(!flag) {
+ MPI_Test(request, flag, status);
+ ...
+ }
+\endcode
+
+\note
+ Internally, in order to speed up execution, we use a counter to keep track
+ on how often we already checked if the handle is now valid or not. Hence, we
+ actually use counter*SLEEP_TIME, that is, the time MPI_Test() causes the process
+ to sleep increases linearly with the number of previously failed tests. This
+ behavior can be disabled by setting smpi/grow-injected-times to no. This will
+ also disable this behavior for MPI_Iprobe.
+
+
+\subsection options_model_smpi_shared_malloc smpi/shared-malloc: Factorize malloc()s
+
+\b Default: global
+
+If your simulation consumes too much memory, you may want to modify
+your code so that the working areas are shared by all MPI ranks. For
+example, in a bloc-cyclic matrix multiplication, you will only
+allocate one set of blocs, and every processes will share them.
+Naturally, this will lead to very wrong results, but this will save a
+lot of memory so this is still desirable for some studies. For more on
+the motivation for that feature, please refer to the
+<a href="https://simgrid.github.io/SMPI_CourseWare/topic_understanding_performance/matrixmultiplication/">relevant
+section</a> of the SMPI CourseWare (see Activity #2.2 of the pointed
+assignment). In practice, change the call to malloc() and free() into
+SMPI_SHARED_MALLOC() and SMPI_SHARED_FREE().
+
+SMPI provides 2 algorithms for this feature. The first one, called \c
+local, allocates one bloc per call to SMPI_SHARED_MALLOC() in your
+code (each call location gets its own bloc) and this bloc is shared
+amongst all MPI ranks. This is implemented with the shm_* functions
+to create a new POSIX shared memory object (kept in RAM, in /dev/shm)
+for each shared bloc.
+
+With the \c global algorithm, each call to SMPI_SHARED_MALLOC()
+returns a new adress, but it only points to a shadow bloc: its memory
+area is mapped on a 1MiB file on disk. If the returned bloc is of size
+N MiB, then the same file is mapped N times to cover the whole bloc.
+At the end, no matter how many SMPI_SHARED_MALLOC you do, this will
+only consume 1 MiB in memory.
+
+You can disable this behavior and come back to regular mallocs (for
+example for debugging purposes) using \c "no" as a value.
+
+\subsection options_model_smpi_wtime smpi/wtime: Inject constant times for calls to MPI_Wtime
+
+\b Default value: 0
+
+By setting this option, you can control the amount of time a process sleeps
+when MPI_Wtime() is called; this is important, because SimGrid normally only
+advances the time while communication is happening and thus,
+MPI_Wtime will not add to the time, resulting in a deadlock if used as a
+break-condition.
+
+Here is an example:
+
+\code{.unparsed}
+ while(MPI_Wtime() < some_time_bound) {
+ ...
+ }
+\endcode
+
+If the time is never advanced, this loop will clearly never end as MPI_Wtime()
+always returns the same value. Hence, pass a (small) value to the smpi/wtime
+option to force a call to MPI_Wtime to advance the time as well.
+
+