X-Git-Url: http://info.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/blobdiff_plain/88ed36bc8542cd80c4181a934da093c0bd2347e8..5d3cc4fa4b1428899a036b2a7c2b9c038673ed4d:/docs/source/Configuring_SimGrid.rst diff --git a/docs/source/Configuring_SimGrid.rst b/docs/source/Configuring_SimGrid.rst index 11a0fc9962..bdb37cee8e 100644 --- a/docs/source/Configuring_SimGrid.rst +++ b/docs/source/Configuring_SimGrid.rst @@ -141,13 +141,17 @@ Existing Configuration Items - **surf/precision:** :ref:`cfg=surf/precision` - **For collective operations of SMPI,** please refer to Section :ref:`cfg=smpi/coll-selector` +- **smpi/auto-shared-malloc-thresh:** :ref:`cfg=smpi/auto-shared-malloc-thresh` - **smpi/async-small-thresh:** :ref:`cfg=smpi/async-small-thresh` - **smpi/buffering:** :ref:`cfg=smpi/buffering` - **smpi/bw-factor:** :ref:`cfg=smpi/bw-factor` - **smpi/coll-selector:** :ref:`cfg=smpi/coll-selector` - **smpi/comp-adjustment-file:** :ref:`cfg=smpi/comp-adjustment-file` - **smpi/cpu-threshold:** :ref:`cfg=smpi/cpu-threshold` +- **smpi/display-allocs:** :ref:`cfg=smpi/display-allocs` - **smpi/display-timing:** :ref:`cfg=smpi/display-timing` +- **smpi/errors-are-fatal:** :ref:`cfg=smpi/errors-are-fatal` +- **smpi/finalization-barrier:** :ref:`cfg=smpi/finalization-barrier` - **smpi/grow-injected-times:** :ref:`cfg=smpi/grow-injected-times` - **smpi/host-speed:** :ref:`cfg=smpi/host-speed` - **smpi/IB-penalty-factors:** :ref:`cfg=smpi/IB-penalty-factors` @@ -160,6 +164,7 @@ Existing Configuration Items - **smpi/or:** :ref:`cfg=smpi/or` - **smpi/os:** :ref:`cfg=smpi/os` - **smpi/papi-events:** :ref:`cfg=smpi/papi-events` +- **smpi/pedantic:** :ref:`cfg=smpi/pedantic` - **smpi/privatization:** :ref:`cfg=smpi/privatization` - **smpi/privatize-libs:** :ref:`cfg=smpi/privatize-libs` - **smpi/send-is-detached-thresh:** :ref:`cfg=smpi/send-is-detached-thresh` @@ -345,7 +350,6 @@ and you should use the last one, which is the maximal size. cat /proc/sys/net/ipv4/tcp_rmem # gives the sender window cat /proc/sys/net/ipv4/tcp_wmem # gives the receiver window -.. _cfg=smpi/IB-penalty-factors: .. _cfg=network/bandwidth-factor: .. _cfg=network/latency-factor: .. _cfg=network/weight-S: @@ -368,15 +372,63 @@ exchange. By default SMPI uses factors computed on the Stampede Supercomputer at TACC, with optimal deployment of processes on nodes. Again, only hardcore experts should bother about this fact. -InfiniBand network behavior can be modeled through 3 parameters -``smpi/IB-penalty-factors:"βe;βs;γs"``, as explained in `this PhD -thesis -`_. .. todo:: This section should be rewritten, and actually explain the options network/bandwidth-factor, network/latency-factor, network/weight-S. +.. _cfg=smpi/IB-penalty-factors: + +Infiniband model +^^^^^^^^^^^^^^^^ + +InfiniBand network behavior can be modeled through 3 parameters +``smpi/IB-penalty-factors:"βe;βs;γs"``, as explained in `this PhD +thesis +`_ (in French) +or more concisely in `this paper `_, +even if that paper does only describe models for myrinet and ethernet. +You can see in Fig 2 some results for Infiniband, for example. This model +may be outdated by now for modern infiniband, anyway, so a new +validation would be good. + +The three paramaters are defined as follows: + +- βs: penalty factor for outgoing messages, computed by running a simple send to + two nodes and checking slowdown compared to a single send to one node, + dividing by 2 +- βe: penalty factor for ingoing messages, same computation method but with one + node receiving several messages +- γr: slowdown factor when communication buffer memory is saturated. It needs a + more complicated pattern to run in order to be computed (5.3 in the thesis, + page 107), and formula in the end is γr = time(c)/(3×βe×time(ref)), where + time(ref) is the time of a single comm with no contention). + +Once these values are computed, a penalty is assessed for each message (this is +the part implemented in the simulator) as shown page 106 of the thesis. Here is +a simple translation of this text. First, some notations: + +- ∆e(e) which corresponds to the incoming degree of node e, that is to say the number of communications having as destination node e. +- ∆s (s) which corresponds to the degree outgoing from node s, that is to say the number of communications sent by node s. +- Φ (e) which corresponds to the number of communications destined for the node e but coming from a different node. +- Ω (s, e) which corresponds to the number of messages coming from node s to node e. If node e only receives communications from different nodes then Φ (e) = ∆e (e). On the other hand if, for example, there are three messages coming from node s and going from node e then Φ (e) 6 = ∆e (e) and Ω (s, e) = 3 + +To determine the penalty for a communication, two values need to be calculated. First, the penalty caused by the conflict in transmission, noted ps. + + +- if ∆s (i) = 1 then ps = 1. +- if ∆s (i) ≥ 2 and ∆e (i) ≥ 3 then ps = ∆s (i) × βs × γr +- else, ps = ∆s (i) × βs + + +Then, the penalty caused by the conflict in reception (noted pe) should be computed as follows: + +- if ∆e (i) = 1 then pe = 1 +- else, pe = Φ (e) × βe × Ω (s, e) + +Finally, the penalty associated with the communication is: +p = max (ps ∈ s, pe) + .. _cfg=network/crosstraffic: Simulating Cross-Traffic @@ -644,11 +696,11 @@ graphviz dot tool to generate a corresponding graphical representation. Exploration Depth Limit ....................... -The ``model-checker/max-depth`` can set the maximum depth of the +The ``model-check/max-depth`` can set the maximum depth of the exploration graph of the model checker. If this limit is reached, a logging message is sent and the results might not be exact. -By default, there is no depth limit. +By default, the exploration is limited to the depth of 1000. .. _cfg=model-check/timeout: @@ -733,7 +785,8 @@ the form ``X/a;Y/b``, the X and Y are the selected pids while the a and b are the return values of their simcalls. In the previous example, ``1/3;1/4``, you can see from the full output that the actor 1 is doing MC_RANDOM simcalls, so the 3 and 4 simply denote the values -that these simcall return. +that these simcall return on the execution branch leading to the +violation. Configuring the User Code Virtualization ---------------------------------------- @@ -1092,6 +1145,23 @@ code, making it difficult to report the simulated time when the simulation ends. If you enable the ``smpi/display-timing`` item, ``smpirun`` will display this information when the simulation ends. +SMPI will also display information about the amout of real time spent +in application code and in SMPI internals, to provide hints about the +need to use sampling to reduce simulation time. + +.. _cfg=smpi/display-allocs: + +Reporting memory allocations +............................ + +**Option** ``smpi/display-allocs`` **Default:** 0 (false) + +SMPI intercepts malloc and calloc calls performed inside the running +application, if it wasn't compiled with SMPI_NO_OVERRIDE_MALLOC. +With this option, SMPI will show at the end of execution the amount of +memory allocated through these calls, and locate the most expensive one. +This helps finding the targets for manual memory sharing, or the threshold +to use for smpi/auto-shared-malloc-thresh option (see :ref:`cfg=smpi/auto-shared-malloc-thresh`). .. _cfg=smpi/keep-temps: @@ -1241,6 +1311,42 @@ Each collective operation can be manually selected with a .. TODO:: All available collective algorithms will be made available via the ``smpirun --help-coll`` command. +Add a barrier in MPI_Finalize +............................. + +.. _cfg=smpi/finalization-barrier: + +**Option** ``smpi/finalization-barrier`` **default:** off + +By default, SMPI processes are destroyed as soon as soon as their code ends, +so after a successful MPI_Finalize call returns. In some rare cases, some data +might have been attached to MPI objects still active in the remaining processes, +and can be destroyed eagerly by the finished process. +If your code shows issues at finalization, such as segmentation fault, triggering +this option will add an explicit MPI_Barrier(MPI_COMM_WORLD) call inside the +MPI_Finalize, so that all processes will terminate at almost the same point. +It might affect the total timing by the cost of a barrier. + +.. _cfg=smpi/errors-are-fatal: + +**Option** ``smpi/errors-are-fatal`` **default:** on + +By default, SMPI processes will crash if a MPI error code is returned. MPI allows +to explicitely set MPI_ERRORS_RETURN errhandler to avoid this behaviour. This flag +will turn on this behaviour by default (for all concerned types and errhandlers). +This can ease debugging by going after the first reported error. + +.. _cfg=smpi/pedantic: + +**Option** ``smpi/pedantic`` **default:** on + +By default, SMPI will report all errors it finds in MPI codes. Some of these errors +may not be considered as errors by all developers. This flag can be turned off to +avoid reporting some usually harmless mistakes. +Concerned errors list (will be expanded in the future): + - Calling MPI_Win_fence only once in a program, hence just opening an epoch without + ever closing it. + .. _cfg=smpi/iprobe: Inject constant times for MPI_Iprobe @@ -1438,6 +1544,24 @@ Then, you can pass the option ``--cfg=smpi/shared-malloc-hugepage:/home/huge`` to smpirun to actually activate the huge page support in shared mallocs. +.. _cfg=smpi/auto-shared-malloc-thresh: + +Automatically share allocations +............................... + +**Option** ``smpi/auto-shared-malloc-thresh:`` **Default:** 0 (false) + This value in bytes represents the size above which all allocations + will be "shared" by default (as if they were performed through + SMPI_SHARED_MALLOC macros). Default = 0 = disabled feature. + The value must be carefully chosen to only select data buffers which + will not modify execution path or cause crash if their content is false. + Option :ref:`cfg=smpi/display-allocs` can be used to locate the largest + allocation detected in a run, and provide a good starting threshold. + Note : malloc, calloc and free are overridden by smpicc/cxx by default. + This can cause some troubles if codes are already overriding these. If this + is the case, defining SMPI_NO_OVERRIDE_MALLOC in the compilation flags can + help, but will make this feature unusable. + .. _cfg=smpi/wtime: Inject constant times for MPI_Wtime, gettimeofday and clock_gettime