Simulation time: 1e3 seconds.
\endverbatim
+\subsection options_smpi_temps smpi/keep-temps: not cleaning up after simulation
+
+\b Default: 0 (false)
+
+Under some conditions, SMPI generates a lot of temporary files. They
+usually get cleaned, but you may use this option to not erase these
+files. This is for example useful when debugging or profiling
+executions using the dlopen privatization schema, as missing binary
+files tend to fool the debuggers.
+
\subsection options_model_smpi_lat_factor smpi/lat-factor: Latency factors
The motivation and syntax for this option is identical to the motivation/syntax
--cfg=smpi/papi-events:"default:PAPI_L3_LDM:PAPI_L2_LDM"
\endverbatim
-\subsection options_smpi_global smpi/privatize-global-variables: Automatic privatization of global variables
+\subsection options_smpi_privatization smpi/privatization: Automatic privatization of global variables
-MPI executables are meant to be executed in separated processes, but SMPI is
+MPI executables are usually meant to be executed in separated processes, but SMPI is
executed in only one process. Global variables from executables will be placed
-in the same memory zone and shared between processes, causing hard to find bugs.
-To avoid this, several options are possible :
- - Manual edition of the code, for example to add __thread keyword before data
- declaration, which allows the resulting code to work with SMPI, but only
- if the thread factory (see \ref options_virt_factory) is used, as global
- variables are then placed in the TLS (thread local storage) segment.
- - Source-to-source transformation, to add a level of indirection
- to the global variables. SMPI does this for F77 codes compiled with smpiff,
- and used to provide coccinelle scripts for C codes, which are not functional anymore.
- - Compilation pass, to have the compiler automatically put the data in
- an adapted zone.
- - Runtime automatic switching of the data segments. SMPI stores a copy of
- each global data segment for each process, and at each context switch replaces
- the actual data with its copy from the right process. This mechanism uses mmap,
- and is for now limited to systems supporting this functionnality (all Linux
- and some BSD should be compatible).
- Another limitation is that SMPI only accounts for global variables defined in
- the executable. If the processes use external global variables from dynamic
- libraries, they won't be switched correctly. To avoid this, using static
- linking is advised (but not with the simgrid library, to avoid replicating
- its own global variables).
-
- To use this runtime automatic switching, the variable \b smpi/privatize-global-variables
- should be set to yes
+in the same memory zone and shared between processes, causing intricate bugs.
+Several options are possible to avoid this, as described in the main
+<a href="https://hal.inria.fr/hal-01415484">SMPI publication</a>.
+SimGrid provides two ways of automatically privatizing the globals,
+and this option allows to choose between them.
+
+ - <b>no</b> (default): Do not automatically privatize variables.
+ - <b>mmap</b> or <b>yes</b>: Runtime automatic switching of the data segments.\n
+ SMPI stores a copy of each global data segment for each process,
+ and at each context switch replaces the actual data with its copy
+ from the right process. No copy actually occures as this mechanism
+ uses mmap for efficiency. As such, it is for now limited to
+ systems supporting this functionnality (all Linux and most BSD).\n
+ Another limitation is that SMPI only accounts for global variables
+ defined in the executable. If the processes use external global
+ variables from dynamic libraries, they won't be switched
+ correctly. The easiest way to solve this is to statically link
+ against the library with these globals (but you should never
+ statically link against the simgrid library itself).
+ - <b>dlopen</b>: Link multiple times against the binary.\n
+ SMPI loads several copy of the same binary in memory, resulting in
+ the natural duplication global variables. Since the dynamic linker
+ refuses to link the same file several times, the binary is copied
+ in a temporary file before being dl-loaded (it is erased right
+ after loading).\n
+ Note that this feature is somewhat experimental at time of writing
+ (v3.16) but seems to work.\n
+ This approach greatly speeds up the context switching, down to
+ about 40 CPU cycles with our raw contextes, instead of requesting
+ several syscalls with the \c mmap approach. Another advantage is
+ that it permits to run the SMPI contexts in parallel, which is
+ obviously not possible with the \c mmap approach.\n
+ Further work may be possible to alleviate the memory and disk
+ overconsumption. It seems that we could
+ <a href="https://lwn.net/Articles/415889/">punch holes</a>
+ in the files before dl-loading them to remove the code and
+ constants, and mmap these area onto a unique copy. This require
+ to understand the ELF layout of the file, but would
+ reduce the disk- and memory- usage to the bare minimum. In
+ addition, this would reduce the pressure on the CPU caches (in
+ particular on instruction one).
\warning
This configuration option cannot be set in your platform file. You can only
pass it as an argument to smpirun.
-
\subsection options_model_smpi_detached Simulating MPI detached send
This threshold specifies the size in bytes under which the send will return
area is mapped on a 1MiB file on disk. If the returned bloc is of size
N MiB, then the same file is mapped N times to cover the whole bloc.
At the end, no matter how many SMPI_SHARED_MALLOC you do, this will
-only consume 1 MiB in memory.
+only consume 1 MiB in memory.
You can disable this behavior and come back to regular mallocs (for
example for debugging purposes) using \c "no" as a value.
+If you want to keep private some parts of the buffer, for instance if these
+parts are used by the application logic and should not be corrupted, you
+can use SMPI_PARTIAL_SHARED_MALLOC(size, offsets, offsets_count).
+
+As an example,
+
+\code{.C}
+ mem = SMPI_PARTIAL_SHARED_MALLOC(500, {27,42 , 100,200}, 2);
+\endcode
+
+will allocate 500 bytes to mem, such that mem[27..41] and mem[100..199]
+are shared and other area remain private.
+
+Then, it can be deallocated by calling SMPI_SHARED_FREE(mem).
+
+When smpi/shared-malloc:global is used, the memory consumption problem
+is solved, but it may induce too much load on the kernel's pages table.
+In this case, you should use huge pages so that we create only one
+entry per Mb of malloced data instead of one entry per 4k.
+To activate this, you must mount a hugetlbfs on your system and allocate
+at least one huge page:
+
+\code{.sh}
+ mkdir /home/huge
+ sudo mount none /home/huge -t hugetlbfs -o rw,mode=0777
+ sudo sh -c 'echo 1 > /proc/sys/vm/nr_hugepages' # echo more if you need more
+\endcode
+
+Then, you can pass the option --cfg=smpi/shared-malloc-hugepage:/home/huge
+to smpirun to actually activate the huge page support in shared mallocs.
+
\subsection options_model_smpi_wtime smpi/wtime: Inject constant times for calls to MPI_Wtime
\b Default value: 0
\section options_index Index of all existing configuration options
\note
- Almost all options are defined in <i>src/simgrid/sg_config.c</i>. You may
- want to check this file, too, but this index should be somewhat complete
- for the moment (May 2015).
-
-\note
- \b Please \b note: You can also pass the command-line option "--help" and
- "--help-cfg" to an executable that uses simgrid.
+ The full list can be retrieved by passing "--help" and
+ "--help-cfg" to an executable that uses SimGrid.
- \c clean-atexit: \ref options_generic_clean_atexit
- \c smpi/iprobe: \ref options_model_smpi_iprobe
- \c smpi/iprobe-cpu-usage: \ref options_model_smpi_iprobe_cpu_usage
- \c smpi/init: \ref options_model_smpi_init
+- \c smpi/keep-temps: \ref options_smpi_temps
- \c smpi/lat-factor: \ref options_model_smpi_lat_factor
- \c smpi/ois: \ref options_model_smpi_ois
- \c smpi/or: \ref options_model_smpi_or
- \c smpi/os: \ref options_model_smpi_os
- \c smpi/papi-events: \ref options_smpi_papi_events
-- \c smpi/privatize-global-variables: \ref options_smpi_global
+- \c smpi/privatization: \ref options_smpi_privatization
- \c smpi/send-is-detached-thresh: \ref options_model_smpi_detached
- \c smpi/shared-malloc: \ref options_model_smpi_shared_malloc
+- \c smpi/shared-malloc-hugepage: \ref options_model_smpi_shared_malloc
- \c smpi/simulate-computation: \ref options_smpi_bench
- \c smpi/test: \ref options_model_smpi_test
- \c smpi/wtime: \ref options_model_smpi_wtime