implementing it, and we'd glad to integrate your contribution to the
main project afterward.
-@subsection SMPI_what_globals Global variables in SMPI
+@subsection SMPI_what_globals Privatization of global variables
Concerning the globals, the problem comes from the fact that usually,
MPI processes run as real UNIX processes while they are all folded
our implementation was not robust enough to be used in production, so
it was removed at some point. Currently, SMPI comes with two
privatization mechanisms that you can @ref options_smpi_privatization
-"select at runtime". At the time of writing (v3.18), the mmap approach
-is considered to be very robust (but a bit slow) while the dlopen
-approach is considered to be much faster. dlopen is used by default.
+"select at runtime". At the time of writing (v3.18), the dlopen
+approach is considered to be very fast (it's used by default) while
+the mmap approach is considered to be rather slow but very robust.
With the <b>mmap approach</b>, SMPI duplicates and dynamically switch
the \c .data and \c .bss segments of the ELF process when switching
executable. It makes perfectly sense in the general case, but we need
to circumvent this rule of thumb in our case. To that extend, the
binary is copied in a temporary file before being re-linked against.
+`dlmopen()` cannot be used as it only allows 256 contextes, and as it
+would also dupplicate simgrid itself.
This approach greatly speeds up the context switching, down to about
40 CPU cycles with our raw contextes, instead of requesting several
before dl-loading them to remove the code and constants, and mmap
these area onto a unique copy. If done correctly, this would reduce
the disk- and memory- usage to the bare minimum, and would also reduce
-the pressure on the CPU instruction cache.\n
+the pressure on the CPU instruction cache. See
+<a href="https://github.com/simgrid/simgrid/issues/137">the relevant
+bug</a> on github for implementation leads.\n
Also, currently, only the binary is copied and dlopen-ed for each MPI
rank. We could probably extend this to external dependencies, but for