X-Git-Url: http://info.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/blobdiff_plain/0e9c0448c6566825b170b98ecff716b098bda10e..5524a24a54c030ab3d1abc9167eea7deae2a7119:/doc/doxygen/module-smpi.doc diff --git a/doc/doxygen/module-smpi.doc b/doc/doxygen/module-smpi.doc index b51f5dfe04..0474d172c1 100644 --- a/doc/doxygen/module-smpi.doc +++ b/doc/doxygen/module-smpi.doc @@ -48,7 +48,7 @@ feature, please get in touch with us: we can guide you though the SimGrid code to help you implementing it, and we'd glad to integrate it in the main project afterward if you contribute them back. -\subsection SMPI_what_globals Issues with the globals +\subsection SMPI_what_globals Global variables Concerning the globals, the problem comes from the fact that usually, MPI processes run as real UNIX processes while they are all folded @@ -67,13 +67,20 @@ available at http://charm.cs.illinois.edu/newPapers/11-23/paper.pdf (note that this article does not deal with SMPI but with a concurrent solution called AMPI that suffers of the same issue). -Currently, we have no solution to offer you, because all proposed solutions will -modify the performance of your application (in the computational -sections). Sacrificing realism for usability is not very satisfying, so we did -not implement them yet. You will thus have to modify your application if it uses -global variables. We are working on another solution, leveraging distributed -simulation to keep each MPI process within a separate system process, but this -is far from being ready at the moment. +A method using dynamic switching of the .data and .bss segments of an +ELF executable has been introduced in SimGrid 3.11. By using the smpi/ +privatize_global_variableles option to yes, SMPI will duplicate +the segments containing the global variables and when needed, will map +the right one in memory. This needs ELF executables and mmap on the system +(Linux and recent BSDs should be compatible). %As no copy is involved, +performance should not be altered (but memory occupation will be higher). + +This solution actually works really good for a good number of MPI +applications. Its main limitation is that if the application loads dynamic +libraries, their global variables won't be privatized. This can be avoided +by linking statically with these libraries (but NOT with libsimgrid, as we +need SimGrid's own global varibles). + \section SMPI_compiling Compiling your code @@ -171,14 +178,14 @@ to allow the user to tune the library and use the better collective if the default one is not good enough. SMPI tries to apply the same logic, regrouping algorithms from OpenMPI, MPICH -libraries, and from StarMPI (STAR-MPI). -This collection of more than a hundred algorithms allows a simple and effective +libraries, StarMPI (STAR-MPI), and MVAPICH2 libraries. +This collection of more than 115 algorithms allows a simple and effective comparison of their behavior and performance, making SMPI a tool of choice for the development of such algorithms. \subsection Tracing_internals Tracing of internal communications -For each collective, default tracing only outputs only global data. +For each collective, default tracing only outputs global data. Internal communication operations are not traced to avoid outputting too much data to the trace. To debug and compare algorithm, this can be changed with the item \b tracing/smpi/internals , which has 0 for default value. @@ -195,8 +202,17 @@ the first one with a ring algorithm, the second with a pairwise one : The default selection logic implemented by default in OpenMPI (version 1.7) and MPICH (version 3.0.4) has been replicated and can be used by setting the -\b smpi/coll_selector item to either ompi or mpich. The code and details for each -selector can be found in the src/smpi/colls/smpi_(openmpi/mpich)_selector.c file. +\b smpi/coll_selector item to either ompi or mpich. A selector based on the selection logic of MVAPICH2 (version 1.9) tuned on the Stampede cluster as also been implemented, as well as a preliminary version of an Intel MPI selector (version 4.1.3, also tuned for the Stampede cluster). Due the closed source nature of Intel MPI, some of the algorithms described in the documentation are not available, and are replaced by mvapich ones. + +Values for option \b smpi/coll_selector are : + - ompi + - mpich + - mvapich2 + - impi + - default + +The code and details for each +selector can be found in the src/smpi/colls/smpi_(openmpi/mpich/mvapich2/impi)_selector.c file. As this is still in development, we do not insure that all algorithms are correctly replicated and that they will behave exactly as the real ones. If you notice a difference, please contact SimGrid developers mailing list @@ -222,6 +238,8 @@ Most of these are best described in Rabenseifner's reduce algorithm \subsubsection MPI_Allreduce - default : naive one, by default - ompi : use openmpi selector for the allreduce operations - mpich : use mpich selector for the allreduce operations + - mvapich2 : use mvapich2 selector for the allreduce operations + - impi : use intel mpi selector for the allreduce operations - automatic (experimental) : use an automatic self-benchmarking algorithm - lr : logical ring reduce-scatter then logical ring allgather - rab1 : variations of the Rabenseifner algorithm : reduce_scatter then allgather @@ -328,24 +366,29 @@ one in most cases) - rab_rsag : variation of the Rabenseifner algorithm : recursive doubling reduce_scatter then recursive doubling allgather - rdb : recursive doubling - - smp_binomial : binomial tree with smp : 8 cores/SMP, binomial intra + - smp_binomial : binomial tree with smp : binomial intra SMP reduce, inter reduce, inter broadcast then intra broadcast - smp_binomial_pipeline : same with segment size = 4096 bytes - - smp_rdb : 8 cores/SMP, intra : binomial allreduce, inter : Recursive + - smp_rdb : intra : binomial allreduce, inter : Recursive doubling allreduce, intra : binomial broadcast - - smp_rsag : 8 cores/SMP, intra : binomial allreduce, inter : reduce-scatter, + - smp_rsag : intra : binomial allreduce, inter : reduce-scatter, inter:allgather, intra : binomial broadcast - - smp_rsag_lr : 8 cores/SMP, intra : binomial allreduce, inter : logical ring + - smp_rsag_lr : intra : binomial allreduce, inter : logical ring reduce-scatter, logical ring inter:allgather, intra : binomial broadcast - - smp_rsag_rab : 8 cores/SMP, intra : binomial allreduce, inter : rab + - smp_rsag_rab : intra : binomial allreduce, inter : rab reduce-scatter, rab inter:allgather, intra : binomial broadcast - redbcast : reduce then broadcast, using default or tuned algorithms if specified - ompi_ring_segmented : ring algorithm used by OpenMPI + - mvapich2_rs : rdb for small messages, reduce-scatter then allgather else + - mvapich2_two_level : SMP-aware algorithm, with mpich as intra algoritm, and rdb as inter (Change this behavior by using mvapich2 selector to use tuned values) + - rab : default Rabenseifner implementation \subsubsection MPI_Reduce_scatter - default : naive one, by default - ompi : use openmpi selector for the reduce_scatter operations - mpich : use mpich selector for the reduce_scatter operations + - mvapich2 : use mvapich2 selector for the reduce_scatter operations + - impi : use intel mpi selector for the reduce_scatter operations - automatic (experimental) : use an automatic self-benchmarking algorithm - ompi_basic_recursivehalving : recursive halving version from OpenMPI - ompi_ring : ring version from OpenMPI @@ -359,6 +402,8 @@ reduce-scatter, rab inter:allgather, intra : binomial broadcast - default : naive one, by default - ompi : use openmpi selector for the allgather operations - mpich : use mpich selector for the allgather operations + - mvapich2 : use mvapich2 selector for the allgather operations + - impi : use intel mpi selector for the allgather operations - automatic (experimental) : use an automatic self-benchmarking algorithm - 2dmesh : see alltoall - 3dmesh : see alltoall @@ -383,12 +428,15 @@ using simple algorithm (hardcoded, default processes/SMP: 8) i + 2, ..., i -> (i + p -1) % P - ompi_neighborexchange : Neighbor Exchange algorithm for allgather. Described by Chen et.al. in Performance Evaluation of Allgather Algorithms on Terascale Linux Cluster with Fast Ethernet + - mvapich2_smp : SMP aware algorithm, performing intra-node gather, inter-node allgather with one process/node, and bcast intra-node \subsubsection MPI_Allgatherv - default : naive one, by default - ompi : use openmpi selector for the allgatherv operations - mpich : use mpich selector for the allgatherv operations + - mvapich2 : use mvapich2 selector for the allgatherv operations + - impi : use intel mpi selector for the allgatherv operations - automatic (experimental) : use an automatic self-benchmarking algorithm - GB : Gatherv - Broadcast (uses tuned version if specified, but only for Bcast, gatherv is not tuned) @@ -404,6 +452,8 @@ one from STAR-MPI - default : naive one, by default - ompi : use openmpi selector for the bcast operations - mpich : use mpich selector for the bcast operations + - mvapich2 : use mvapich2 selector for the bcast operations + - impi : use intel mpi selector for the bcast operations - automatic (experimental) : use an automatic self-benchmarking algorithm - arrival_pattern_aware : root exchanges with the first process to arrive - arrival_pattern_aware_wait : same with slight variation @@ -421,7 +471,9 @@ one from STAR-MPI - SMP_linear : linear algorithm with 8 cores/SMP - ompi_split_bintree : binary tree algorithm from OpenMPI, with message split in 8192 bytes pieces - ompi_pipeline : pipeline algorithm from OpenMPI, with message split in 128KB pieces - + - mvapich2_inter_node : Inter node default mvapich worker + - mvapich2_intra_node : Intra node default mvapich worker + - mvapich2_knomial_intra_node : k-nomial intra node default mvapich worker. default factor is 4. \subsection auto Automatic evaluation