docs/source/Modeling_howtos.rst

   1 .. raw:: html
   2
   3    <object id="TOC" data="graphical-toc.svg" type="image/svg+xml"></object>
   4    <script>
   5    window.onload=function() { // Wait for the SVG to be loaded before changing it
   6      var elem=document.querySelector("#TOC").contentDocument.getElementById("PlatformBox")
   7      elem.style="opacity:0.93999999;fill:#ff0000;fill-opacity:0.1;stroke:#000000;stroke-width:0.35277778;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1";
   8    }
   9    </script>
  10    <br/>
  11    <br/>
  12
  13 .. _howto:
  14
  15 Modeling Hints
  16 ##############
  17
  18 There is no perfect model. Only models that are adapted to the
  19 specific study that you want to do. SimGrid provides several advanced
  20 mechanisms that you can adapt to model the situation that you are
  21 interested in, and it is often uneasy to see where to start with.
  22 This page collects several hints and tricks on modeling situations.
  23 Even if you are looking for a very advanced, specific use case, these
  24 examples may help you to design the solution you need.
  25
  26 .. _howto_science:
  27
  28 Doing Science with SimGrid
  29 **************************
  30
  31 Many users are using SimGrid as a scientific instrument for their
  32 research. This tool was indeed invented to that extent, and we strive
  33 to streamline this kind of usage. But SimGrid is no magical tool, and
  34 it is of your responsibility that the tool actually provides sensible
  35 results. Fortunately, there is a vast literature on how to avoid
  36 Modeling & Simulations pitfalls. We review here some specific works.
  37
  38 In `An Integrated Approach to Evaluating Simulation Credibility
  39 <http://www.dtic.mil/dtic/tr/fulltext/u2/a405051.pdf>`_, the authors
  40 provide a methodology enabling the users to increase their confidence
  41 in the simulation tools they use. First of all, you must know what you
  42 actually expect to discover whether the tool actually covers your
  43 needs. Then, as they say, "a fool with a tool is still a fool", so you
  44 need to think about your methodology before you submit your articles.
  45 `Towards a Credibility Assessment of Models and Simulations
  46 <https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20080015742.pdf>`_
  47 gives a formal methodology to assess the credibility of your
  48 simulation results.
  49
  50 `Seven Pitfalls in Modeling and Simulation Research
  51 <https://dl.acm.org/citation.cfm?id=2430188>`_ is even more
  52 specific. Here are the listed pitfalls: (1) Don't know whether it's
  53 modeling or simulation, (2) No separation of concerns, (3) No clear
  54 scientific question, (4) Implementing everything from scratch, (5)
  55 Unsupported claims, (6) Toy duck approach, and (7) The tunnel view. As
  56 you can see, this article is a must read. It's a pity that it's not
  57 freely available, though.
  58
  59 .. _howto_churn:
  60
  61 Modeling churn (e.g., in P2P)
  62 *****************************
  63
  64 One of the biggest challenges in P2P settings is to cope with the
  65 churn, meaning that resources keep appearing and disappearing. In
  66 SimGrid, you can always change the state of each host manually, with
  67 eg :cpp:func:`simgrid::s4u::Host::turn_on`. To reduce the burden when
  68 the churn is high, you can also attach a **state profile** to the host
  69 directly.
  70
  71 This can be done through the XML file, using the ``state_file``
  72 attribute of :ref:`pf_tag_host`, :ref:`pf_tag_cluster` or
  73 :ref:`pf_tag_link`. Every line (but the last) of such files describes
  74 timed events with the form "date value". Example:
  75
  76 .. code-block:: python
  77
  78    1 0
  79    2 1
  80    LOOPAFTER 8
  81
  82 This file uses a cryptic yet simple formalism:
  83
  84   * At time t = 1, the host is turned off (a zero value means OFF).
  85   * At time t = 2, the host is turned back on (any other value than zero means ON).
  86   * At time t = 10, the profile is reset (as we are 8 seconds after the last event). Then the host will be turned off again at time t = 11.
  87
  88 If your profile does not contain any LOOPAFTER line, then it will be executed only once and not in a repetitive way.
  89
  90 Another possibility is to use the
  91 :cpp:func:`simgrid::s4u::Host::set_state_profile()` or
  92 :cpp:func:`simgrid::s4u::Link::set_state_profile()` functions. These
  93 functions take a profile, that can be a fixed profile exhaustively
  94 listing the events, or something else if you wish.
  95
  96 For further reading, you could turn to :ref:`this example <s4u_ex_comm_failure>`
  97 on how to react to communication failures, or  :ref:`this one <s4u_ex_platform_state_profile>`
  98 on how to attach a state profile to hosts and react to execution failures.
  99
 100 .. _howto_multicore:
 101
 102 Modeling multicore machines
 103 ***************************
 104
 105 Default model
 106 =============
 107
 108 Multicore machines are very complex, and there are many ways to model
 109 them. The default models of SimGrid are coarse grain and capture some
 110 elements of this reality. Here is how to declare simple multicore hosts:
 111
 112 .. code-block:: xml
 113
 114    <host id="mymachine" speed="8Gf" core="4"/>
 115
 116 It declares a 4-core host called "mymachine", each core computing 8
 117 GFlops per second. If you put one activity of 8 GFlops on this host, it
 118 will be computed in 1 second (by default, activities are
 119 single-threaded and cannot leverage the computing power of more than
 120 one core). If you run two such activities simultaneously, they will still be
 121 computed in one second, and so on up to 4 activities. If you start 5 activities,
 122 they will share the total computing power, and each activity will be
 123 computed in 5/4 = 1.25 seconds. This is a very simple model, but that is
 124 all what you get by default from SimGrid.
 125
 126 Pinning tasks to cores
 127 ======================
 128
 129 The default model does not account for task pinning, where you
 130 manually select on which core each of the existing activity should
 131 execute. The best solution to model this is probably to model your
 132 4-core processor as 4 distinct hosts, and assigning the activities to
 133 cores by migrating them to the declared hosts. In some sense, this
 134 takes the whole Network-On-Chip idea really seriously.
 135
 136 Some extra complications may arise here. If you have more activities than
 137 cores, you'll have to `schedule your activities
 138 <https://en.wikipedia.org/wiki/Scheduling_%28computing%29#Operating_system_process_scheduler_implementations)>`_
 139 yourself on the cores (so you'd better avoid this complexity). Since
 140 you cannot have more than one network model in a given SimGrid
 141 simulation, you will end up with a TCP connection between your cores. A
 142 possible work around is to never start any simulated communication
 143 between the cores and have the same routes from each core to the
 144 rest of the external network.
 145
 146 Modeling a multicore CPU as a set of SimGrid hosts may seem strange
 147 and unconvincing, but some users achieved very realistic simulations
 148 of multicore and GPU machines this way.
 149
 150 Modeling machine boot and shutdown periods
 151 ******************************************
 152
 153 When a physical host boots up, a lot of things happen. It takes time
 154 during which the machine is not usable but dissipates energy, and
 155 programs actually die and restart during a reboot. Since there are many
 156 ways to model it, SimGrid does not do any modeling choice for you but
 157 the most obvious ones.
 158
 159 Any actor running on a host that is shut down will be killed and all
 160 its activities will be automatically canceled. If the actor killed was
 161 marked as auto-restartable (with :cpp:func:`simgrid::s4u::Actor::set_auto_restart`),
 162 it will start anew with the same parameters when the host boots back up.
 163
 164 By default, shutdowns and boots are instantaneous. If you want to
 165 add an extra delay, you have to do that yourself, for example from a
 166 `controller` actor that runs on another host. The best way to do so is
 167 to declare a fictional pstate where the CPU delivers 0 flop per
 168 second (so every activity on that host will be frozen when the host is
 169 in this pstate). When you want to switch the host off, your controller
 170 switches the host to that specific pstate (with
 171 :cpp:func:`simgrid::s4u::Host::set_pstate`), waits for the amount of
 172 time that you decided necessary for your host to shut down, and turns
 173 the host off (with :cpp:func:`simgrid::s4u::Host::turn_off`). To boot
 174 up, switch the host on, go into the specific pstate, wait a while and
 175 go to a more regular pstate.
 176
 177 To model the energy dissipation, you need to put the right energy
 178 consumption in your startup/shutdown specific pstate. Remember that
 179 the energy consumed is equal to the instantaneous consumption
 180 multiplied by the time in which the host keeps in that state. Do the
 181 maths, and set the right instantaneous consumption to your pstate, and
 182 you'll get the whole boot period to consume the amount of energy that
 183 you want. You may want to have one fictional pstate for the boot
 184 period and another one for the shutdown period.
 185
 186 Of course, this is only one possible way to model these things. YMMV ;)
 187
 188 .. _howto_parallel_links:
 189
 190 Modeling parallel links
 191 ***********************
 192
 193 Most HPC topologies, such as fat-trees, allow parallel links (a
 194 router A and a router B can be connected by more than one link).
 195 You might be tempted to model this configuration as follows :
 196
 197 .. code-block:: xml
 198
 199     <router id="routerA"/>
 200     <router id="routerB"/>
 201
 202     <link id="link1" bandwidth="10GBps" latency="2us"/>
 203     <link id="link2" bandwidth="10GBps" latency="2us"/>
 204
 205     <route src="routerA" dst="routerB">
 206         <link_ctn id="link1"/>
 207     </route>
 208     <route src="routerA" dst="routerB">
 209         <link_ctn id="link2"/>
 210     </route>
 211
 212 But that will not work, since SimGrid doesn't allow several routes for
 213 a single `{src ; dst}` pair. Instead, what you should do is:
 214
 215   - Use a single route with both links (so both will be traversed
 216     each time a message is exchanged between router A and B)
 217
 218   - Double the bandwidth of one link, to model the total bandwidth of
 219     both links used in parallel. This will make sure no combined
 220     communications between router A and B use more than the bandwidth
 221     of two links
 222
 223   - Assign the other link a `FATPIPE` sharing policy, which will allow
 224     several communications to use the full bandwidth of this link without
 225     having to share it. This will model the fact that individual
 226     communications can use at most this link's bandwidth
 227
 228   - Set the latency of one of the links to 0, so that latency is only
 229     accounted for once (since both link are traversed by each message)
 230
 231 So the final platform for our example becomes :
 232
 233 .. code-block:: xml
 234
 235     <router id="routerA"/>
 236     <router id="routerB"/>
 237
 238     <!-- This link limits the total bandwidth of all parallel communications -->
 239     <link id="link1" bandwidth="20GBps" latency="2us"/>
 240
 241     <!-- This link only limits the bandwidth of individual communications -->
 242     <link id="link2" bandwidth="10GBps" latency="0us" sharing_policy="FATPIPE"/>
 243
 244     <!-- Each message traverses both links -->
 245     <route src="routerA" dst="routerB">
 246         <link_ctn id="link1"/>
 247         <link_ctn id="link2"/>
 248     </route>
 249