docs: add a link to the cited paper

[simgrid.git] / docs / source / tuto_smpi.rst
diff --git a/docs/source/tuto_smpi.rst b/docs/source/tuto_smpi.rst

index 7bc1e13..e0a693c 100644 (file)
--- a/docs/source/tuto_smpi.rst
+++ b/docs/source/tuto_smpi.rst
@@ -3,6 +3,11 @@
  Simulating MPI Applications
  ===========================
  
+.. warning:: This document is still in early stage. You can try to
+   take this tutorial, but should not be surprised if things fall short.
+   It will be completed for the next release, v3.22, released by the end
+   of 2018.
+
  Discover SMPI
  -------------
  
@@ -49,7 +54,7 @@ To start using SMPI, you just need to compile your application with
  ``mpiff``, or with ``smpicxx`` instead of ``mpicxx``. Then, the only
  difference between the classical ``mpirun`` and the new ``smpirun`` is
  that it requires a new parameter ``-platform`` with a file describing
-the virtual platform on which your application shall run.
+the simulated platform on which your application shall run.
  
  Internally, all ranks of your application are executed as threads of a
  single unix process. That's not a problem if your application has
@@ -68,7 +73,7 @@ Describing Your Platform
  ------------------------
  
  As a SMPI user, you are supposed to provide a description of your
-virtual platform, that is mostly a set of simulated hosts and network
+simulated platform, that is mostly a set of simulated hosts and network
  links with some performance characteristics. SimGrid provides a plenty
  of :ref:`documentation <platform>` and examples (in the
  `examples/platforms <https://framagit.org/simgrid/simgrid/tree/master/examples/platforms>`_
@@ -209,15 +214,28 @@ Dragonfly Cluster
  This topology was introduced to further reduce the amount of links
  while maintaining a high bandwidth for local communications. To model
  this in SimGrid, pass a ``topology="DRAGONFLY"`` attribute to your
-cluster.
+cluster. It's based on the implementation of the topology used on 
+Cray XC systems, described in paper
+`Cray Cascade: A scalable HPC system based on a Dragonfly network <https://dl.acm.org/citation.cfm?id=2389136>`_.
+
+System description follows the format ``topo_parameters=#groups;#chassis;#routers;#nodes``
+For example, ``3,4 ; 3,2 ; 3,1 ; 2``:
+
+- ``3,4``: There are 3 groups with 4 links between each (blue level).
+  Links to nth group are attached to the nth router of the group 
+  on our implementation.
+- ``3,2``: In each group, there are 3 chassis with 2 links between each nth router
+  of each group (black level)
+- ``3,1``: In each chassis, 3 routers are connected together with a single link
+  (green level)
+- ``2``: Each router has two nodes attached (single link) 
+
+.. image:: ../../examples/platforms/cluster_dragonfly.svg
+   :align: center
  
  .. literalinclude:: ../../examples/platforms/cluster_dragonfly.xml
     :language: xml
  
-.. todo::
-
-   Add the image, and the documuentation of the topo_parameters.
-
  Final Word
  ..........
  
@@ -347,11 +365,13 @@ nodes from the ``cluster_crossbar.xml`` platform as follows:
    command-line arguments (if any -- roundtrip does not expect any arguments).
  
  Feel free to tweak the content of the XML platform file and the
-prorgam to see the effect on the simulated execution time. Note that
-the simulation accounts for realistic network protocol effects and MPI
-implementation effects. As a result, you may see "unexpected behavior"
-like in the real world (e.g., sending a message 1 byte larger may lead
-to significant higher execution time).
+program to see the effect on the simulated execution time. It may be
+easier to compare the executions with the extra option
+``--cfg=smpi/display_timing:yes``.  Note that the simulation accounts
+for realistic network protocol effects and MPI implementation
+effects. As a result, you may see "unexpected behavior" like in the
+real world (e.g., sending a message 1 byte larger may lead to
+significant higher execution time).
  
  Lab 1: Visualizing LU
  ---------------------
@@ -363,14 +383,16 @@ between the original ``config/make.def.template`` and the
  ``config/make.def`` that was adapted to SMPI. We use ``smpiff`` and
  ``smpicc`` as compilers, and don't pass any additional library.
  
-Now compile and execute the LU benchmark, class A (i.e., for small
-data size) with 4 nodes.
+Now compile and execute the LU benchmark, class S (i.e., for `small
+data size
+<https://www.nas.nasa.gov/publications/npb_problem_sizes.html>`_) with
+4 nodes.
  
  .. code-block:: shell
  
-   $ make lu NPROCS=4 CLASS=A
+   $ make lu NPROCS=4 CLASS=S
     (compilation logs)
-   $ smpirun -np 4 -platform ../cluster_backbone.xml bin/lu.A.4
+   $ smpirun -np 4 -platform ../cluster_backbone.xml bin/lu.S.4
     (execution logs)
  
  To get a better understanding of what is going on, activate the
@@ -379,8 +401,8 @@ use:
  
  .. code-block:: shell
  
-   smpirun -np 4 -platform ../cluster_backbone.xml -trace --cfg=tracing/filename:lu.A.4.trace bin/lu.A.4
-   pj_dump --ignore-incomplete-links lu.A.4.trace | grep State > lu.A.4.state.csv
+   smpirun -np 4 -platform ../cluster_backbone.xml -trace --cfg=tracing/filename:lu.S.4.trace bin/lu.S.4
+   pj_dump --ignore-incomplete-links lu.S.4.trace | grep State > lu.S.4.state.csv
  
  You can then produce a Gantt Chart with the following R chunk. You can
  either copy/paste it in a R session, or `turn it into a Rscript executable
@@ -392,7 +414,7 @@ run it again and again.
     library(ggplot2)
  
     # Read the data
-   df_state = read.csv("lu.A.4.state.csv", header=F, strip.white=T)
+   df_state = read.csv("lu.S.4.state.csv", header=F, strip.white=T)
     names(df_state) = c("Type", "Rank", "Container", "Start", "End", "Duration", "Level", "State");
     df_state = df_state[!(names(df_state) %in% c("Type","Container","Level"))]
     df_state$Rank = as.numeric(gsub("rank-","",df_state$Rank))
@@ -405,10 +427,10 @@ run it again and again.
     dev.off()
  
  This produces a file called ``Rplots.pdf`` with the following
-content. You can find more examples of visualization in the `SimGrid
-documentation <http://simgrid.gforge.inria.fr/contrib/R_visualization.html>`_.
+content. You can find more visualization examples `online
+<http://simgrid.gforge.inria.fr/contrib/R_visualization.html>`_.
  
-.. image:: /tuto_smpi/img/lu.A.4.png
+.. image:: /tuto_smpi/img/lu.S.4.png
     :align: center
  
  Lab 2: Tracing and Replay of LU
@@ -461,12 +483,30 @@ is computationally hungry.
  Lab 3: Execution Sampling on EP
  -------------------------------
  
-The second method to speed up simulations is to sample the computation parts in the code.
-This means that the person doing the simulation needs to know the application and identify
-parts that are compute intensive and take time, while being regular enough not to ruin
-simulation accuracy. Furthermore there should not be any MPI calls inside such parts of the
-code.
+The second method to speed up simulations is to sample the computation
+parts in the code.  This means that the person doing the simulation
+needs to know the application and identify parts that are compute
+intensive and take time, while being regular enough not to ruin
+simulation accuracy. Furthermore there should not be any MPI calls
+inside such parts of the code.
  
  Use the EP benchmark, class B, 16 processes.
  
+.. todo:: write this section, and the following ones.
+
+Further Readings
+----------------
+
+You may also be interested in the `SMPI reference article
+<https://hal.inria.fr/hal-01415484>`_ or these `introductory slides
+<http://simgrid.org/tutorials/simgrid-smpi-101.pdf>`_. The `SMPI
+reference documentation <SMPI_doc>`_ covers much more content than
+this short tutorial.
+
+Finally, we regularly use SimGrid in our teachings on MPI. This way,
+our student can experiment with platforms that they do not have access
+to, and the associated visualisation tools helps them to understand
+their work.  The whole material is available online, in a separate
+project: the `SMPI CourseWare <https://simgrid.github.io/SMPI_CourseWare/>`_.
+
  ..  LocalWords:  SimGrid