X-Git-Url: http://info.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/blobdiff_plain/2cfe340b9729c7ee55cf813f6a36f15dc918696b..3af3a09062f663d730af98444b314581212a78b4:/docs/source/tuto_smpi.rst

diff --git a/docs/source/tuto_smpi.rst b/docs/source/tuto_smpi.rst
index 05847208f4..f886d6a060 100644
--- a/docs/source/tuto_smpi.rst
+++ b/docs/source/tuto_smpi.rst
@@ -110,8 +110,8 @@ concurrently at full speed. For instance:
    :lines: 1-3,18-
 
 One specifies a name prefix and suffix for each host, and then give an
-integer range. In the example the cluster contains 262145 hosts (!),
-named ``host-0.simgrid.org`` to ``host-262144.simgrid.org``. All hosts
+integer range. In the example the cluster contains 65535 hosts (!),
+named ``node-0.simgrid.org`` to ``node-65534.simgrid.org``. All hosts
 have the same power (1 Gflop/sec) and are connected to the switch via
 links with same bandwidth (125 MBytes/sec) and latency (50
 microseconds).
@@ -137,9 +137,9 @@ The only differences with the crossbar cluster above are the ``bb_bw``
 and ``bb_lat`` attributes that specify the backbone characteristics
 (here, a 500 microseconds latency and a 2.25 GByte/sec
 bandwidth). This link is used for every communication within the
-cluster. The route from ``node-0.acme.org`` to ``node-1.acme.org``
-counts 3 links: the private link of ``node-0.acme.org``, the backbone
-and the private link of ``node-1.acme.org``.
+cluster. The route from ``node-0.simgrid.org`` to ``node-1.simgrid.org``
+counts 3 links: the private link of ``node-0.simgrid.org``, the backbone
+and the private link of ``node-1.simgrid.org``.
 	   
 .. todo::
 
@@ -290,23 +290,172 @@ Debian and Ubuntu for example, you can get them as follows:
 
    sudo apt install simgrid pajeng make gcc g++ gfortran vite
 
-An initial version of the source code is provided on framagit. This
-template compiles with cmake. If SimGrid is correctly installed, you
-should be able to clone the `repository
-<https://framagit.org/simgrid/simgrid-template-smpi>`_ and recompile
-everything as follows:
+To take this tutorial, you will also need the platform files from the
+previous section as well as the source code of the NAS Parallel
+Benchmarks. Just  clone `this repository
+<https://framagit.org/simgrid/simgrid-template-smpi>`_  to get them all:
 
 .. code-block:: shell
 
    git clone git@framagit.org:simgrid/simgrid-template-smpi.git
    cd simgrid-template-smpi/
-   cmake .
-   make
 
 If you struggle with the compilation, then you should double check
 your :ref:`SimGrid installation <install>`.  On need, please refer to
 the :ref:`Troubleshooting your Project Setup
 <install_yours_troubleshooting>` section.
 
+Lab 0: Hello World
+------------------
+
+It is time to simulate your first MPI program. Use the simplistic
+example `roundtrip.c
+<https://framagit.org/simgrid/simgrid-template-smpi/raw/master/roundtrip.c?inline=false>`_
+that comes with the template.
+
+.. literalinclude:: /tuto_smpi/roundtrip.c
+   :language: c
+
+Compiling and Executing
+.......................
+	      
+Compiling the program is straightforward (double check your
+:ref:`SimGrid installation <install>` if you get an error message):
+
+
+.. code-block:: shell
+		
+  $ smpicc -O3 roundtrip.c -o roundtrip
+
+
+Once compiled, you can simulate the execution of this program on 16
+nodes from the ``cluster_crossbar.xml`` platform as follows:
+
+.. code-block:: shell
+
+   $ smpirun -np 16 -platform cluster_crossbar.xml -hostfile cluster_hostfile ./roundtrip
+
+- The ``-np 16`` option, just like in regular MPI, specifies the
+  number of MPI processes to use. 
+- The ``-hostfile cluster_hostfile`` option, just like in regular
+  MPI, specifies the host file. If you omit this option, ``smpirun``
+  will deploy the application on the first machines of your platform.
+- The ``-platform cluster_crossbar.xml`` option, **which doesn't exist
+  in regular MPI**, specifies the platform configuration to be
+  simulated. 
+- At the end of the line, one finds the executable name and
+  command-line arguments (if any -- roundtrip does not expect any arguments).
+
+Feel free to tweak the content of the XML platform file and the
+prorgam to see the effect on the simulated execution time. Note that
+the simulation accounts for realistic network protocol effects and MPI
+implementation effects. As a result, you may see "unexpected behavior"
+like in the real world (e.g., sending a message 1 byte larger may lead
+to significant higher execution time).
+
+Lab 1: Visualizing LU
+---------------------
+
+We will now simulate a larger application: the LU benchmark of the NAS
+suite. The version provided in the code template was modified to
+compile with SMPI instead of the regular MPI. Compare the difference
+between the original ``config/make.def.template`` and the
+``config/make.def`` that was adapted to SMPI. We use ``smpiff`` and
+``smpicc`` as compilers, and don't pass any additional library.
+
+Now compile and execute the LU benchmark, class A (i.e., for small
+data size) with 4 nodes.
+
+.. code-block:: shell
+
+   $ make lu NPROCS=4 CLASS=A
+   (compilation logs)
+   $ smpirun -np 4 -platform ../cluster_backbone.xml bin/lu.A.4
+   (execution logs)
+
+To get a better understanding of what is going on, activate the
+vizualization tracing, and convert the produced trace for later
+use:
+
+.. code-block:: shell
+
+   smpirun -np 4 -platform ../cluster_backbone.xml -trace --cfg=tracing/filename:lu.A.4.trace bin/lu.A.4
+   pj_dump --ignore-incomplete-links lu.A.4.trace | grep State > lu.A.4.state.csv
+
+You can then produce a Gantt Chart with the following R chunk. You can
+either copy/paste it in a R session, or `turn it into a Rscript executable
+<https://swcarpentry.github.io/r-novice-inflammation/05-cmdline/>`_ to
+run it again and again.
+
+.. code-block:: R
+
+   library(ggplot2)
+
+   # Read the data
+   df_state = read.csv("lu.A.4.state.csv", header=F, strip.white=T)
+   names(df_state) = c("Type", "Rank", "Container", "Start", "End", "Duration", "Level", "State");
+   df_state = df_state[!(names(df_state) %in% c("Type","Container","Level"))]
+   df_state$Rank = as.numeric(gsub("rank-","",df_state$Rank))
+
+   # Draw the Gantt Chart
+   gc = ggplot(data=df_state) + geom_rect(aes(xmin=Start, xmax=End, ymin=Rank, ymax=Rank+1,fill=State))
+
+   # Produce the output
+   plot(gc)
+   dev.off()
+
+This produces a file called ``Rplots.pdf`` with the following
+content. You can find more examples of visualization in the `SimGrid
+documentation <http://simgrid.gforge.inria.fr/contrib/R_visualization.html>`_.
+
+.. image:: /tuto_smpi/img/lu.A.4.png
+   :align: center
+
+Lab 2: Tracing and Replay of LU
+-------------------------------
+
+Now compile and execute the LU benchmark, class A, with 32 nodes.
+
+.. code-block:: shell
+
+   $ make lu NPROCS=32 CLASS=A
+
+This takes several minutes to to simulate, because all code from all
+processes has to be really executed, and everything is serialized.
+
+SMPI provides several methods to speed things up. One of them is to
+capture a time independent trace of the running application, and
+replay it on a different platform with the same amount of nodes. The
+replay is much faster than live simulation, as the computations are
+skipped (the application must be network-dependent for this to work).
+
+You can even generate the trace during as live simulation, as follows:
+
+.. code-block:: shell
+
+   $ smpirun -trace-ti --cfg=tracing/filename:LU.A.32 -np 32 -platform ../cluster_backbone.xml bin/lu.A.32 
+
+The produced trace is composed of a file ``LU.A.32`` and a folder
+``LU.A.32_files``. To replay this with SMPI, you need to first compile
+the provided ``smpi_replay.cpp`` file, that comes from
+`simgrid/examples/smpi/replay
+<https://framagit.org/simgrid/simgrid/tree/master/examples/smpi/replay>`_.
+
+.. code-block:: shell
+
+   $ smpicxx ../replay.cpp -O3 -o ../smpi_replay
+
+Afterward, you can replay your trace in SMPI as follows:
+
+   $ smpirun -np 32 -platform ../cluster_torus.xml -ext smpi_replay ../smpi_replay LU.A.32
+
+All the outputs are gone, as the application is not really simulated
+here. Its trace is simply replayed. But if you visualize the live
+simulation and the replay, you will see that the behavior is
+unchanged. The simulation does not run much faster on this very
+example, but this becomes very interesting when your application
+is computationally hungry.
+
+.. todo:: smpi_replay should be installed by SimGrid, and smpirun interface could be simplified here.
 
 ..  LocalWords:  SimGrid