X-Git-Url: http://info.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/blobdiff_plain/2ad1267f6cbefcc4424910b060cc87fb39c7add6..5937b88aaa18de687b2de15a97ee3ae0dc480f64:/docs/source/tuto_smpi.rst diff --git a/docs/source/tuto_smpi.rst b/docs/source/tuto_smpi.rst index 3af0229128..434a16152d 100644 --- a/docs/source/tuto_smpi.rst +++ b/docs/source/tuto_smpi.rst @@ -110,8 +110,8 @@ concurrently at full speed. For instance: :lines: 1-3,18- One specifies a name prefix and suffix for each host, and then give an -integer range. In the example the cluster contains 262145 hosts (!), -named ``host-0.simgrid.org`` to ``host-262144.simgrid.org``. All hosts +integer range. In the example the cluster contains 65535 hosts (!), +named ``node-0.simgrid.org`` to ``node-65534.simgrid.org``. All hosts have the same power (1 Gflop/sec) and are connected to the switch via links with same bandwidth (125 MBytes/sec) and latency (50 microseconds). @@ -137,9 +137,9 @@ The only differences with the crossbar cluster above are the ``bb_bw`` and ``bb_lat`` attributes that specify the backbone characteristics (here, a 500 microseconds latency and a 2.25 GByte/sec bandwidth). This link is used for every communication within the -cluster. The route from ``node-0.acme.org`` to ``node-1.acme.org`` -counts 3 links: the private link of ``node-0.acme.org``, the backbone -and the private link of ``node-1.acme.org``. +cluster. The route from ``node-0.simgrid.org`` to ``node-1.simgrid.org`` +counts 3 links: the private link of ``node-0.simgrid.org``, the backbone +and the private link of ``node-1.simgrid.org``. .. todo:: @@ -246,33 +246,35 @@ image. Once you `installed Docker itself .. code-block:: shell docker pull simgrid/tuto-smpi - docker run -it --rm --name simgrid --volume ~/smpi-tutorial:/src/tutorial simgrid/tuto-smpi bash + docker run -it --rm --name simgrid --volume ~/smpi-tutorial:/source/tutorial simgrid/tuto-smpi bash This will start a new container with all you need to take this tutorial, and create a ``smpi-tutorial`` directory in your home on -your host machine that will be visible as ``/src/tutorial`` within the +your host machine that will be visible as ``/source/tutorial`` within the container. You can then edit the files you want with your favorite editor in ``~/smpi-tutorial``, and compile them within the container to enjoy the provided dependencies. .. warning:: - Any change to the container out of ``/src/tutorial`` will be lost + Any change to the container out of ``/source/tutorial`` will be lost when you log out of the container, so don't edit the other files! All needed dependencies are already installed in this container -(SimGrid, a C/C++ compiler, a Fortran compiler, make, pajeng and -R). Vite being only optional in this tutorial, it is not installed to -reduce the image size. +(SimGrid, the C/C++/Fortran compilers, make, pajeng and R). Vite being +only optional in this tutorial, it is not installed to reduce the +image size. -The code template is available under ``/src/simgrid-template-smpi`` in -the image. You should copy it to your working directory when you first -log in: +The container also include the example platform files from the +previous section as well as the source code of the NAS Parallel +Benchmarks. These files are available under +``/source/simgrid-template-smpi`` in the image. You should copy it to +your working directory when you first log in: .. code-block:: shell - cp -r /src/simgrid-template-smpi/* /src/tutorial - cd /src/tutorial + cp -r /source/simgrid-template-smpi/* /source/tutorial + cd /source/tutorial Using your Computer Natively ............................ @@ -288,23 +290,205 @@ Debian and Ubuntu for example, you can get them as follows: sudo apt install simgrid pajeng make gcc g++ gfortran vite -An initial version of the source code is provided on framagit. This -template compiles with cmake. If SimGrid is correctly installed, you -should be able to clone the `repository -`_ and recompile -everything as follows: +To take this tutorial, you will also need the platform files from the +previous section as well as the source code of the NAS Parallel +Benchmarks. Just clone `this repository +`_ to get them all: .. code-block:: shell git clone git@framagit.org:simgrid/simgrid-template-smpi.git cd simgrid-template-smpi/ - cmake . - make If you struggle with the compilation, then you should double check your :ref:`SimGrid installation `. On need, please refer to the :ref:`Troubleshooting your Project Setup ` section. +Lab 0: Hello World +------------------ + +It is time to simulate your first MPI program. Use the simplistic +example `roundtrip.c +`_ +that comes with the template. + +.. literalinclude:: /tuto_smpi/roundtrip.c + :language: c + +Compiling and Executing +....................... + +Compiling the program is straightforward (double check your +:ref:`SimGrid installation ` if you get an error message): + + +.. code-block:: shell + + $ smpicc -O3 roundtrip.c -o roundtrip + + +Once compiled, you can simulate the execution of this program on 16 +nodes from the ``cluster_crossbar.xml`` platform as follows: + +.. code-block:: shell + + $ smpirun -np 16 -platform cluster_crossbar.xml -hostfile cluster_hostfile ./roundtrip + +- The ``-np 16`` option, just like in regular MPI, specifies the + number of MPI processes to use. +- The ``-hostfile cluster_hostfile`` option, just like in regular + MPI, specifies the host file. If you omit this option, ``smpirun`` + will deploy the application on the first machines of your platform. +- The ``-platform cluster_crossbar.xml`` option, **which doesn't exist + in regular MPI**, specifies the platform configuration to be + simulated. +- At the end of the line, one finds the executable name and + command-line arguments (if any -- roundtrip does not expect any arguments). + +Feel free to tweak the content of the XML platform file and the +program to see the effect on the simulated execution time. It may be +easier to compare the executions with the extra option +``--cfg=smpi/display_timing:yes``. Note that the simulation accounts +for realistic network protocol effects and MPI implementation +effects. As a result, you may see "unexpected behavior" like in the +real world (e.g., sending a message 1 byte larger may lead to +significant higher execution time). + +Lab 1: Visualizing LU +--------------------- + +We will now simulate a larger application: the LU benchmark of the NAS +suite. The version provided in the code template was modified to +compile with SMPI instead of the regular MPI. Compare the difference +between the original ``config/make.def.template`` and the +``config/make.def`` that was adapted to SMPI. We use ``smpiff`` and +``smpicc`` as compilers, and don't pass any additional library. + +Now compile and execute the LU benchmark, class S (i.e., for `small +data size +`_) with +4 nodes. + +.. code-block:: shell + + $ make lu NPROCS=4 CLASS=S + (compilation logs) + $ smpirun -np 4 -platform ../cluster_backbone.xml bin/lu.S.4 + (execution logs) + +To get a better understanding of what is going on, activate the +vizualization tracing, and convert the produced trace for later +use: + +.. code-block:: shell + + smpirun -np 4 -platform ../cluster_backbone.xml -trace --cfg=tracing/filename:lu.S.4.trace bin/lu.S.4 + pj_dump --ignore-incomplete-links lu.S.4.trace | grep State > lu.S.4.state.csv + +You can then produce a Gantt Chart with the following R chunk. You can +either copy/paste it in a R session, or `turn it into a Rscript executable +`_ to +run it again and again. + +.. code-block:: R + + library(ggplot2) + + # Read the data + df_state = read.csv("lu.S.4.state.csv", header=F, strip.white=T) + names(df_state) = c("Type", "Rank", "Container", "Start", "End", "Duration", "Level", "State"); + df_state = df_state[!(names(df_state) %in% c("Type","Container","Level"))] + df_state$Rank = as.numeric(gsub("rank-","",df_state$Rank)) + + # Draw the Gantt Chart + gc = ggplot(data=df_state) + geom_rect(aes(xmin=Start, xmax=End, ymin=Rank, ymax=Rank+1,fill=State)) + + # Produce the output + plot(gc) + dev.off() + +This produces a file called ``Rplots.pdf`` with the following +content. You can find more visualization examples `online +`_. + +.. image:: /tuto_smpi/img/lu.S.4.png + :align: center + +Lab 2: Tracing and Replay of LU +------------------------------- + +Now compile and execute the LU benchmark, class A, with 32 nodes. + +.. code-block:: shell + + $ make lu NPROCS=32 CLASS=A + +This takes several minutes to to simulate, because all code from all +processes has to be really executed, and everything is serialized. + +SMPI provides several methods to speed things up. One of them is to +capture a time independent trace of the running application, and +replay it on a different platform with the same amount of nodes. The +replay is much faster than live simulation, as the computations are +skipped (the application must be network-dependent for this to work). + +You can even generate the trace during as live simulation, as follows: + +.. code-block:: shell + + $ smpirun -trace-ti --cfg=tracing/filename:LU.A.32 -np 32 -platform ../cluster_backbone.xml bin/lu.A.32 + +The produced trace is composed of a file ``LU.A.32`` and a folder +``LU.A.32_files``. To replay this with SMPI, you need to first compile +the provided ``smpi_replay.cpp`` file, that comes from +`simgrid/examples/smpi/replay +`_. + +.. code-block:: shell + + $ smpicxx ../replay.cpp -O3 -o ../smpi_replay + +Afterward, you can replay your trace in SMPI as follows: + + $ smpirun -np 32 -platform ../cluster_torus.xml -ext smpi_replay ../smpi_replay LU.A.32 + +All the outputs are gone, as the application is not really simulated +here. Its trace is simply replayed. But if you visualize the live +simulation and the replay, you will see that the behavior is +unchanged. The simulation does not run much faster on this very +example, but this becomes very interesting when your application +is computationally hungry. + +.. todo:: smpi_replay should be installed by SimGrid, and smpirun interface could be simplified here. + +Lab 3: Execution Sampling on EP +------------------------------- + +The second method to speed up simulations is to sample the computation +parts in the code. This means that the person doing the simulation +needs to know the application and identify parts that are compute +intensive and take time, while being regular enough not to ruin +simulation accuracy. Furthermore there should not be any MPI calls +inside such parts of the code. + +Use the EP benchmark, class B, 16 processes. + +.. todo:: write this section, and the following ones. + +Further Readings +---------------- + +You may also be interested in the `SMPI reference article +`_ or these `introductory slides +`_. The `SMPI +reference documentation `_ covers much more content than +this short tutorial. + +Finally, we regularly use SimGrid in our teachings on MPI. This way, +our student can experiment with platforms that they do not have access +to, and the associated visualisation tools helps them to understand +their work. The whole material is available online, in a separate +project: the `SMPI CourseWare `_. .. LocalWords: SimGrid