X-Git-Url: http://info.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/blobdiff_plain/f7608d26fe1501b8b9effc4a53344da32887364e..8dbe2058fb2e5717d3077dbaf890d3919405697e:/doc/doxygen/module-smpi.doc diff --git a/doc/doxygen/module-smpi.doc b/doc/doxygen/module-smpi.doc index b3c26f777b..06d67b6153 100644 --- a/doc/doxygen/module-smpi.doc +++ b/doc/doxygen/module-smpi.doc @@ -452,7 +452,7 @@ touch with us: we can guide you though the SimGrid code to help you implementing it, and we'd glad to integrate your contribution to the main project afterward. -@subsection SMPI_what_globals Global variables in SMPI +@subsection SMPI_what_globals Privatization of global variables Concerning the globals, the problem comes from the fact that usually, MPI processes run as real UNIX processes while they are all folded @@ -472,21 +472,21 @@ privatized the globals through static analysis of the source code. But our implementation was not robust enough to be used in production, so it was removed at some point. Currently, SMPI comes with two privatization mechanisms that you can @ref options_smpi_privatization -"select at runtime". At the time of writing (v3.18), the mmap approach -is considered to be very robust (but a bit slow) while the dlopen -approach is considered to be fast and experimental. +"select at runtime". At the time of writing (v3.18), the dlopen +approach is considered to be very fast (it's used by default) while +the mmap approach is considered to be rather slow but very robust. With the mmap approach, SMPI duplicates and dynamically switch the \c .data and \c .bss segments of the ELF process when switching the MPI ranks. This allows each ranks to have its own copy of the global variables. No copy actually occures as this mechanism uses \c mmap for efficiency. This mechanism is considered to be very robust on -all systems supporting \c mmap (Linux and most BSDs), so smpirun -activates it by default. Its performance is questionable since each -context switch between MPI ranks induces several syscalls to change -the \c mmap that redirects the \c .data and \c .bss segments to the -copies of the new rank. The code will also be copied several times in -memory, inducing a slight increase of memory occupation. +all systems supporting \c mmap (Linux and most BSDs). Its performance +is questionable since each context switch between MPI ranks induces +several syscalls to change the \c mmap that redirects the \c .data and +\c .bss segments to the copies of the new rank. The code will also be +copied several times in memory, inducing a slight increase of memory +occupation. Another limitation is that SMPI only accounts for global variables defined in the executable. If the processes use external global @@ -508,12 +508,15 @@ the exact same file several times, be it a library or a relocatable executable. It makes perfectly sense in the general case, but we need to circumvent this rule of thumb in our case. To that extend, the binary is copied in a temporary file before being re-linked against. +`dlmopen()` cannot be used as it only allows 256 contextes, and as it +would also dupplicate simgrid itself. This approach greatly speeds up the context switching, down to about 40 CPU cycles with our raw contextes, instead of requesting several syscalls with the \c mmap approach. Another advantage is that it permits to run the SMPI contexts in parallel, which is obviously not -possible with the \c mmap approach. +possible with the \c mmap approach. It was tricky to implement, but we +are not aware of any flaws, so smpirun activates it by default. In the future, it may be possible to further reduce the memory and disk consumption. It seems that we could punch holes in the files before dl-loading them to remove the code and constants, and mmap these area onto a unique copy. If done correctly, this would reduce the disk- and memory- usage to the bare minimum, and would also reduce -the pressure on the CPU instruction cache.\n +the pressure on the CPU instruction cache. See +the relevant +bug on github for implementation leads.\n Also, currently, only the binary is copied and dlopen-ed for each MPI rank. We could probably extend this to external dependencies, but for