X-Git-Url: http://info.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/blobdiff_plain/7c4993df35d256d18b51528c8f5b63382ded808b..895710d49f77179d9893bc76b3e31b69fae638af:/doc/doxygen/module-smpi.doc diff --git a/doc/doxygen/module-smpi.doc b/doc/doxygen/module-smpi.doc index 399961baf6..ede94c177d 100644 --- a/doc/doxygen/module-smpi.doc +++ b/doc/doxygen/module-smpi.doc @@ -4,25 +4,35 @@ @tableofcontents -[TOC] - -This programming environment enables the study of MPI application by -emulating them on top of the SimGrid simulator. This is particularly -interesting to study existing MPI applications within the comfort of -the simulator. The motivation for this work is detailed in the -reference article (available at http://hal.inria.fr/inria-00527150). - - -Our goal is to enable the study of **unmodified MPI applications**, -and even if some constructs and features are still missing, we -consider SMPI to be stable and usable in production. For **further -scalability**, you may modify your code to speed up your studies or -save memory space. Improved **simulation accuracy** requires some -specific care from you. +SMPI enables the study of MPI application by emulating them on top of +the SimGrid simulator. This is particularly interesting to study +existing MPI applications within the comfort of the simulator. The +SMPI reference article is available at +https://hal.inria.fr/hal-01415484. You should also read the +SMPI +introductory slides. + +Our goal is to enable the study of **unmodified MPI applications**. +Some constructs and features are still missing, but we can probably +add them on demand. If you already used MPI before, SMPI should sound +very familiar to you: Use smpicc instead of mpicc, and smpirun instead +of mpirun. The main difference is that smpirun takes a virtual +platform as extra parameter (see @ref platform). + +If you are new to MPI, you should first take our online [SMPI +CourseWare](https://simgrid.github.io/SMPI_CourseWare/). It consists +in several projects that progressively introduce the MPI concepts. It +proposes to use SimGrid and SMPI to run the experiments, but the +learning objectives are centered on MPI itself. + +For **further scalability**, you may modify your code to speed up your +studies or save memory space. Maximal **simulation accuracy** +requires some specific care from you. - @ref SMPI_use - @ref SMPI_use_compile - @ref SMPI_use_exec + - @ref SMPI_use_debug - @ref SMPI_use_colls - @ref SMPI_use_colls_algos - @ref SMPI_use_colls_tracing @@ -33,41 +43,22 @@ specific care from you. - @ref SMPI_adapting_size - @ref SMPI_adapting_speed - @ref SMPI_accuracy + - @ref SMPI_troubleshooting + - @ref SMPI_trouble_configure_refuses_smpicc + - @ref SMPI_trouble_configure_dont_find_smpicc + - @ref SMPI_trouble_useconds_t @section SMPI_use Using SMPI -If you're absolutely new to MPI, you should first take our online -[SMPI CourseWare](https://simgrid.github.io/SMPI_CourseWare/), and/or -take a MPI course in your favorite university. If you already know -MPI, SMPI should sound very familiar to you: Use smpicc instead of -mpicc, and smpirun instead of mpirun, and you're almost set. Once you -get a virtual platform description (see @ref platform), you're good to -go. - @subsection SMPI_use_compile Compiling your code -For that, simply use smpicc as a compiler just -like you use mpicc with other MPI implementations. This script -still calls your default compiler (gcc, clang, ...) and adds the right -compilation flags along the way. - -Alas, some building infrastructures cannot cope with that and your -./configure may fail, reporting that the compiler is not -functional. If this happens, define the SMPI_PRETEND_CC -environment variable before running the configuration. Do not define -it when using SMPI! - -@verbatim -SMPI_PRETEND_CC=1 ./configure # here come the configure parameters -make -@endverbatim - -\warning - Again, make sure that SMPI_PRETEND_CC is not set when you actually - compile your application. It is just a work-around for some configure-scripts - and replaces some internals by "return 0;". Your simulation will not - work with this variable set! +If your application is in C, then simply use smpicc as a +compiler just like you use mpicc with other MPI implementations. This +script still calls your default compiler (gcc, clang, ...) and adds +the right compilation flags along the way. If your application is in +C++, Fortran 77 or Fortran 90, use respectively smpicxx, +smpiff or smpif90. @subsection SMPI_use_exec Executing your code on the simulator @@ -97,6 +88,21 @@ by running smpirun -help @endverbatim +@subsection SMPI_use_debug Debugging your code on top of SMPI + +If you want to explore the automatic platform and deployment files +that are generated by @c smpirun, add @c -keep-temps to the command +line. + +You can also run your simulation within valgrind or gdb using the +following commands. Once in GDB, each MPI ranks will be represented as +a regular thread, and you can explore the state of each of them as +usual. +@verbatim +smpirun -wrapper valgrind ...other args... +smpirun -wrapper "gdb --args" --cfg=contexts/factory:thread ...other args... +@endverbatim + @subsection SMPI_use_colls Simulating collective operations MPI collective operations are crucial to the performance of MPI @@ -118,7 +124,7 @@ selector algorithms, that were collected directly in the source code of the targeted MPI implementations. You can switch the automatic selector through the -\c smpi/coll_selector configuration item. Possible values: +\c smpi/coll-selector configuration item. Possible values: - ompi: default selection logic of OpenMPI (version 1.7) - mpich: default selection logic of MPICH (version 3.0.4) @@ -156,11 +162,11 @@ Most of these are best described in + - bruck: Described by Bruck et.al. in this paper - 2dmesh: organizes the nodes as a two dimensional mesh, and perform allgather along the dimensions - 3dmesh: adds a third dimension to the previous algorithm - - rdb: recursive doubling : extends the mesh to a nth dimension, each one + - rdb: recursive doubling: extends the mesh to a nth dimension, each one containing two nodes - pair: pairwise exchange, only works for power of 2 procs, size-1 steps, each process sends and receives from the same process at each step @@ -349,7 +355,7 @@ Described by Chen et.al. in FIXME marker. If you really need a missing feature, please -get in touch with us: we can guide you though the SimGrid code to help -you implementing it, and we'd glad to integrate it in the main project -afterward if you contribute them back. +FIXME marker. If you really miss a feature, please get in +touch with us: we can guide you though the SimGrid code to help you +implementing it, and we'd glad to integrate your contribution to the +main project afterward. -@subsection SMPI_what_globals Global variables +@subsection SMPI_what_globals Privatization of global variables Concerning the globals, the problem comes from the fact that usually, MPI processes run as real UNIX processes while they are all folded into threads of a unique system process in SMPI. Global variables are usually private to each MPI process while they become shared between -the processes in SMPI. This point is rather problematic, and currently -forces to modify your application to privatize the global variables. - -We tried several techniques to work this around. We used to have a -script that privatized automatically the globals through static -analysis of the source code, but it was not robust enough to be used -in production. This issue, as well as several potential solutions, is +the processes in SMPI. The problem and some potential solutions are discussed in this article: "Automatic Handling of Global Variables for -Multi-threaded MPI Programs", -available at http://charm.cs.illinois.edu/newPapers/11-23/paper.pdf -(note that this article does not deal with SMPI but with a competing -solution called AMPI that suffers of the same issue). - -SimGrid can duplicate and dynamically switch the .data and .bss -segments of the ELF process when switching the MPI ranks, allowing -each ranks to have its own copy of the global variables. This feature -is expected to work correctly on Linux and BSD, so smpirun activates -it by default. As no copy is involved, performance should not be -altered (but memory occupation will be higher). - -If you want to turn it off, pass \c -no-privatize to smpirun. This may -be necessary if your application uses dynamic libraries as the global -variables of these libraries will not be privatized. You can fix this -by linking statically with these libraries (but NOT with libsimgrid, -as we need SimGrid's own global variables). +Multi-threaded MPI Programs", available at +http://charm.cs.illinois.edu/newPapers/11-23/paper.pdf (note that this +article does not deal with SMPI but with a competing solution called +AMPI that suffers of the same issue). This point used to be +problematic in SimGrid, but the problem should now be handled +automatically on Linux. + +Older versions of SimGrid came with a script that automatically +privatized the globals through static analysis of the source code. But +our implementation was not robust enough to be used in production, so +it was removed at some point. Currently, SMPI comes with two +privatization mechanisms that you can @ref options_smpi_privatization +"select at runtime". At the time of writing (v3.18), the dlopen +approach is considered to be very fast (it's used by default) while +the mmap approach is considered to be rather slow but very robust. + +With the mmap approach, SMPI duplicates and dynamically switch +the \c .data and \c .bss segments of the ELF process when switching +the MPI ranks. This allows each ranks to have its own copy of the +global variables. No copy actually occures as this mechanism uses \c +mmap for efficiency. This mechanism is considered to be very robust on +all systems supporting \c mmap (Linux and most BSDs). Its performance +is questionable since each context switch between MPI ranks induces +several syscalls to change the \c mmap that redirects the \c .data and +\c .bss segments to the copies of the new rank. The code will also be +copied several times in memory, inducing a slight increase of memory +occupation. + +Another limitation is that SMPI only accounts for global variables +defined in the executable. If the processes use external global +variables from dynamic libraries, they won't be switched +correctly. The easiest way to solve this is to statically link against +the library with these globals. This way, each MPI rank will get its +own copy of these libraries. Of course you should never statically +link against the SimGrid library itself. + +With the dlopen approach, SMPI loads several copies of the same +executable in memory as if it were a library, so that the global +variables get naturally duplicated. It first requires the executable +to be compiled as a relocatable binary, which is less common for +programs than for libraries. But most distributions are now compiled +this way for security reason as it allows to randomize the address +space layout. It should thus be safe to compile most (any?) program +this way. The second trick is that the dynamic linker refuses to link +the exact same file several times, be it a library or a relocatable +executable. It makes perfectly sense in the general case, but we need +to circumvent this rule of thumb in our case. To that extend, the +binary is copied in a temporary file before being re-linked against. +`dlmopen()` cannot be used as it only allows 256 contextes, and as it +would also dupplicate simgrid itself. + +This approach greatly speeds up the context switching, down to about +40 CPU cycles with our raw contextes, instead of requesting several +syscalls with the \c mmap approach. Another advantage is that it +permits to run the SMPI contexts in parallel, which is obviously not +possible with the \c mmap approach. It was tricky to implement, but we +are not aware of any flaws, so smpirun activates it by default. + +In the future, it may be possible to further reduce the memory and +disk consumption. It seems that we could punch holes in the files +before dl-loading them to remove the code and constants, and mmap +these area onto a unique copy. If done correctly, this would reduce +the disk- and memory- usage to the bare minimum, and would also reduce +the pressure on the CPU instruction cache. See +the relevant +bug on github for implementation leads.\n + +Also, currently, only the binary is copied and dlopen-ed for each MPI +rank. We could probably extend this to external dependencies, but for +now, any external dependencies must be statically linked into your +application. As usual, simgrid itself shall never be statically linked +in your app. You don't want to give a copy of SimGrid to each MPI rank: +that's ways too much for them to deal with. @section SMPI_adapting Adapting your MPI code for further scalability As detailed in the reference article (available at -http://hal.inria.fr/inria-00527150), you may want to adapt your code +http://hal.inria.fr/hal-01415484), you may want to adapt your code to improve the simulation performance. But these tricks may seriously hinder the result quality (or even prevent the app to run) if used wrongly. We assume that if you want to simulate an HPC application, @@ -584,6 +640,60 @@ Finally, you may want to check [this article](https://hal.inria.fr/hal-00907887) on the classical pitfalls in modeling distributed systems. +@section SMPI_troubleshooting Troubleshooting with SMPI + +@subsection SMPI_trouble_configure_refuses_smpicc ./configure refuses to use smpicc + +If your ./configure reports that the compiler is not +functional or that you are cross-compiling, try to define the +SMPI_PRETEND_CC environment variable before running the +configuration. + +@verbatim +SMPI_PRETEND_CC=1 ./configure # here come the configure parameters +make +@endverbatim + +Indeed, the programs compiled with smpicc cannot be executed +without smpirun (they are shared libraries, and they do weird +things on startup), while configure wants to test them directly. +With SMPI_PRETEND_CC smpicc does not compile as shared, +and the SMPI initialization stops and returns 0 before doing anything +that would fail without smpirun. + +\warning + + Make sure that SMPI_PRETEND_CC is only set when calling ./configure, + not during the actual execution, or any program compiled with smpicc + will stop before starting. + +@subsection SMPI_trouble_configure_dont_find_smpicc ./configure does not pick smpicc as a compiler + +In addition to the previous answers, some projects also need to be +explicitely told what compiler to use, as follows: + +@verbatim +SMPI_PRETEND_CC=1 ./configure CC=smpicc # here come the other configure parameters +make +@endverbatim + +Maybe your configure is using another variable, such as cc or +similar. Just check the logs. + +@subsection SMPI_trouble_useconds_t error: unknown type name 'useconds_t' + +Try to add -D_GNU_SOURCE to your compilation line to get ride +of that error. + +The reason is that SMPI provides its own version of usleep(3) +to override it and to block in the simulation world, not in the real +one. It needs the useconds_t type for that, which is declared +only if you declare _GNU_SOURCE before including +unistd.h. If your project includes that header file before +SMPI, then you need to ensure that you pass the right configuration +defines as advised above. + + */