From 34242a04ecfdec872e424ef8a0adc6fc4506750d Mon Sep 17 00:00:00 2001 From: Martin Quinson Date: Mon, 25 Mar 2013 21:31:56 +0100 Subject: [PATCH] rework the SMPI documentation quite a bit --- doc/doxygen/module-smpi.doc | 161 +++++++++++++++++++++++++----------- 1 file changed, 114 insertions(+), 47 deletions(-) diff --git a/doc/doxygen/module-smpi.doc b/doc/doxygen/module-smpi.doc index 71e68a0b6d..77abc9bbd1 100644 --- a/doc/doxygen/module-smpi.doc +++ b/doc/doxygen/module-smpi.doc @@ -1,64 +1,81 @@ /** @addtogroup SMPI_API -This programming environment permits to study existing MPI application -by emulating them on top of the SimGrid simulator. In other words, it -will constitute an emulation solution for parallel codes. You don't -even have to modify your code for that, although that may help, as -detailed below. +This programming environment enables the study of MPI application by +emulating them on top of the SimGrid simulator. This is particularly +interesting to study existing MPI applications within the comfort of +the simulator. The motivation for this work is detailed in the +reference article (available at http://hal.inria.fr/inria-00527150). + +Our goal is to enable the study of *unmodified* MPI applications, +although we are not quite there yet (see @ref SMPI_what). In +addition, you can modify your code to speed up your studies or +otherwise increase their scalability (see @ref SMPI_adapting). \section SMPI_who Who should use SMPI (and who shouldn't) -You should use this programming environment of the SimGrid suite if -you want to study existing MPI applications. If you want to study -some heuristics for a given problem (and if your goal is to produce -theorems and publications, not code), have a look at the \ref MSG_API -environment, or the \ref SD_API one if you need to use DAGs. If none -of those programming environments fits your needs, you may consider -implementing your own directly on top of \ref SURF_API (but you -probably want to contact us before). +SMPI is now considered as stable and you can use it in production. You +may probably want to read the scientific publications that detail the +models used and their limits, but this should not be absolutely +necessary. + +If you already fluently write and use MPI applications, SMPI should +sound very familiar to you. Use smpicc instead of mpicc, and smpirun +instead of mpirun. Some more information are given below on this page. + +Of course, if you don't know what MPI is, the documentation of SMPI +will seem a bit terse to you. You should pick up a good MPI tutorial +on the Internet (or a course in your university) and come back to SMPI +once you know a bit more about MPI. Alternatively, you may want to +turn to the other SimGrid interface such as the \ref MSG_API +environment, or the \ref SD_API one. \section SMPI_what What can run within SMPI? You can run unmodified MPI applications (both C and Fortran) within -SMPI, provided you only use MPI calls that we implemented in MPI. Our -coverage of the interface is not bad, but will probably never be -complete. One sided communications and I/O primitives are not targeted -for now. The full list of not yet implemented functions is available -in file include/smpi/smpi.h of the archive, between two lines +SMPI, provided that (1) you only use MPI calls that we implemented in +MPI and (2) you don't use any globals in your application. + +\subsection SMPI_what_coverage MPI coverage of SMPI + +Our coverage of the interface is very decent, but still incomplete; +Given the size of the MPI standard, it may well be that we never +implement absolutely all existing primitives. One sided communications +and I/O primitives are not targeted for now. Our current state is +still very decent: we pass most of the MPICH coverage tests. + +The full list of not yet implemented functions is documented in the +file include/smpi/smpi.h of the archive, between two lines containing the FIXME marker. If you really need a missing feature, please get in touch with us: we can guide you though the SimGrid code to help you implementing it, and we'd glad to integrate it in the main project afterward if you contribute them back. -\section SMPI_adapting Adapting your MPI code to the use of SMPI - -As detailed in the reference article (available at -http://hal.inria.fr/inria-00527150), you may want to adapt your code -to improve the simulation performance. But these tricks may seriously -hinder the result qualtity (or even prevent the app to run) if used -wrongly. We assume that if you want to simulate an HPC application, -you know what you are doing. Don't prove us wrong! - -If you get short on memory (the whole app is executed on a single node -when simulated), you should have a look at the SMPI_SHARED_MALLOC and -SMPI_SHARED_FREE macros. It allows to share memory areas between -processes. For example, matrix multiplication code may want to store -the blocks on the same area. Of course, the resulting computations -will useless, but you can still study the application behavior this -way. Of course, if your code is data-dependent, this won't work. - -If your application is too slow, try using SMPI_SAMPLE_LOCAL, -SMPI_SAMPLE_GLOBAL and friends to indicate which computation loops can -be sampled. Some of the loop iterations will be executed to measure -their duration, and this duration will be used for the subsequent -iterations. These samples are done per processor with -SMPI_SAMPLE_LOCAL, and shared between all processors with -SMPI_SAMPLE_GLOBAL. Of course, none of this will work if the execution -time of your loop iteration are not stable. - -Yes, that's right, these macros are not documented yet, but we'll fix -it as soon as time permits. Sorry about that -- patch welcomed! -Meanwhile, grep for them on the examples for more information. +\subsection SMPI_what_globals Issues with the globals + +Concerning the globals, the problem comes from the fact that usually, +MPI processes run as real UNIX processes while they are all folded +into threads of a unique system process in SMPI. Global variables are +usually private to each MPI process while they become shared between +the processes in SMPI. This point is rather problematic, and currently +forces to modify your application to privatize the global variables. + +We tried several techniques to work this around. We used to have a +script that privatized automatically the globals through static +analysis of the source code, but it was not robust enough to be used +in production. This issue, as well as several potential solutions, is +discussed in this article: "Automatic Handling of Global Variables for +Multi-threaded MPI Programs", +available at http://charm.cs.illinois.edu/newPapers/11-23/paper.pdf +(note that this article does not deal with SMPI but with a concurrent +solution called AMPI that suffers of the same issue). + +Currently, we have no solution to offer you, because all proposed solutions will +modify the performance of your application (in the computational +sections). Sacrificing realism for usability is not very satisfying, so we did +not implement them yet. You will thus have to modify your application if it uses +global variables. We are working on another solution, leveraging distributed +simulation to keep each MPI process within a separate system process, but this +is far from being ready at the moment. \section SMPI_compiling Compiling your code @@ -93,4 +110,54 @@ by running smpirun -help \endverbatim +\section SMPI_adapting Adapting your MPI code to the use of SMPI + +As detailed in the reference article (available at +http://hal.inria.fr/inria-00527150), you may want to adapt your code +to improve the simulation performance. But these tricks may seriously +hinder the result quality (or even prevent the app to run) if used +wrongly. We assume that if you want to simulate an HPC application, +you know what you are doing. Don't prove us wrong! + +\section SMPI_adapting_size Reducing your memory footprint + +If you get short on memory (the whole app is executed on a single node when +simulated), you should have a look at the SMPI_SHARED_MALLOC and +SMPI_SHARED_FREE macros. It allows to share memory areas between processes: The +purpose of these macro is that the same line malloc on each process will point +to the exact same memory area. So if you have a malloc of 2M and you have 16 +processes, this macro will change your memory consumption from 2M*16 to 2M +only. Only one block for all processes. + +If your program is ok with a block containing garbage value because all +processes write and read to the same place without any kind of coordination, +then this macro can dramatically shrink your memory consumption. For example, +that will be very beneficial to a matrix multiplication code, as all blocks will +be stored on the same area. Of course, the resulting computations will useless, +but you can still study the application behavior this way. + +Naturally, this won't work if your code is data-dependent. For example, a Jacobi +iterative computation depends on the result computed by the code to detect +convergence conditions, so turning them into garbage by sharing the same memory +area between processes does not seem very wise. You cannot use the +SMPI_SHARED_MALLOC macro in this case, sorry. + +This feature is demoed by the example file +examples/smpi/NAS/DT-folding/dt.c + +\section SMPI_adapting_speed Toward faster simulations + +If your application is too slow, try using SMPI_SAMPLE_LOCAL, +SMPI_SAMPLE_GLOBAL and friends to indicate which computation loops can +be sampled. Some of the loop iterations will be executed to measure +their duration, and this duration will be used for the subsequent +iterations. These samples are done per processor with +SMPI_SAMPLE_LOCAL, and shared between all processors with +SMPI_SAMPLE_GLOBAL. Of course, none of this will work if the execution +time of your loop iteration are not stable. + +This feature is demoed by the example file +examples/smpi/NAS/EP-sampling/ep.c + + */ \ No newline at end of file -- 2.20.1