docs/source/intro_concepts.rst

   1 .. _intro_concepts:
   2
   3 Main Concepts
   4 =============
   5
   6 Typical Study based on SimGrid
   7 ------------------------------
   8
   9 .. raw:: html
  10
  11    <object data="graphical-toc.svg" width="100%" type="image/svg+xml"></object>
  12
  13
  14 Any SimGrid study entails the following components:
  15
  16  - The studied **Application**. This can be either a distributed
  17    algorithm described in our simple APIs, or a full featured real
  18    parallel application using for example the MPI interface
  19    :ref:`(more info) <application>`.
  20
  21  - The **Simulated Platform**. This is a description of a given
  22    distributed system (machines, links, disks, clusters, etc). Most of
  23    the platform files are written in XML althrough a Lua interface is
  24    under development.  SimGrid makes it easy to augment the Simulated
  25    Platform with a Dynamic Scenario where for example the links are
  26    slowed down (because of external usage) or the machines fail. You
  27    have even support to specify the applicative workload that you want
  28    to feed to your application
  29    :ref:`(more info) <platform>`.
  30
  31  - The application's **Deployment Description**. In SimGrid
  32    terminology, the application is an inert set of source files and
  33    binaries. To make it run, you have to describe how your application
  34    should be deployed on the simulated platform. You need to specify
  35    which process is mapped on which machine, along with their parameters
  36    :ref:`(more info) <scenario>`.
  37
  38  - The **Platform Models**. They describe how the simulated platform
  39    reacts to the actions of the application. For example, they compute
  40    the time taken by a given communication on the simulated platform.
  41    These models are already included in SimGrid, and you only need to
  42    pick one and maybe tweak its configuration to get your results
  43    :ref:`(more info) <models>`.
  44
  45 These components are put together to run a **simulation**, that is an
  46 experiment or a probe. The result of one or many simulation provides
  47 an **outcome** (logs, visualization, or statistical analysis) that help
  48 answering the **question** targeted by this study.
  49
  50 Here are some questions on which SimGrid is particularly relevant:
  51
  52  - **Compare an Application to another**. This is the classical use
  53    case for scientists, who use SimGrid to test how the solution that
  54    they contribute to compares to the existing solutions from the
  55    literature.
  56
  57  - **Design the best [Simulated] Platform for a given Application.**
  58    Tweaking the platform file is much easier than building a new real
  59    platform for testing purpose. SimGrid also allows for the co-design
  60    of the platform and the application by modifying both of them.
  61
  62  - **Debug Real Applications**. With real systems, is sometimes
  63    difficult to reproduce the exact run leading to the bug that you
  64    are tracking. With SimGrid, you are *clairvoyant* about your
  65    *reproducible experiments*: you can explore every part of the
  66    system, and your probe will not change the simulated state. It also
  67    makes it easy to mock some parts of the real system that are not
  68    under study.
  69
  70 Depending on the context, you may see some parts of this process as
  71 less important, but you should pay close attention if you want to be
  72 confident in the results coming out of your simulations. In
  73 particular, you should not blindly trust your results but always
  74 strive to double-check them. Likewise, you should question the realism
  75 of your input configuration, and we even encourage you to doubt (and
  76 check) the provided performance models.
  77
  78 To ease such questioning, you really should logically separate these
  79 parts in your experimental setup. It is seen as a very bad practice to
  80 merge the application, the platform, and the deployment all together.
  81 SimGrid is versatile and your mileage may vary, but you should start
  82 with your Application specified as a C++ or Java program, using one of
  83 the provided XML platform file, and with your deployment in a separate
  84 XML file.
  85
  86 SimGrid Execution Modes
  87 -----------------------
  88
  89 Depending on the intended study, SimGrid can be run in several execution modes.
  90
  91 **Simulation Mode**. This is the most common execution mode, where you want
  92 to study how your application behaves on the simulated platform under
  93 the experimental scenario.
  94
  95 In this mode, SimGrid can provide information about the time taken by
  96 your application, the amount of energy dissipated by the platform to
  97 run your application, and the detailed usage of each resource.
  98
  99 **Model-Checking Mode**. This can be seen as a sort of exhaustive
 100 testing mode, where every possible outcome of your application is
 101 explored. In some sense, this mode tests your application for all
 102 possible platforms that you could imagine (and more).
 103
 104 You just provide the application and its deployment (amount of
 105 processes and parameters), and the model-checker will literally
 106 explore all possible outcomes by testing all possible message
 107 interleavings: if at some point a given process can either receive the
 108 message A first or the message B depending on the platform
 109 characteristics, the model-checker will explore the scenario where A
 110 arrives first, and then rewind to the same point to explore the
 111 scenario where B arrives first.
 112
 113 This is a very powerful mode, where you can evaluate the correction of
 114 your application. It can verify either **safety properties** (asserts)
 115 or **liveless properties** stating for example that if a given event
 116 occurs, then another given event will occur in a finite amount of
 117 steps. This mode is not only usable with the abstract algorithms
 118 developed on top of the SimGrid APIs, but also with real MPI
 119 applications (to some extent).
 120
 121 The main limit of Model Checking lays in the huge amount of scenarios
 122 to explore. SimGrid tries to explore only non-redundant scenarios
 123 thanks to classical reduction techniques (such as DPOR and stateful
 124 exploration) but the exploration may well never finish if you don't
 125 carefully adapt your application to this mode.
 126
 127 A classical trap is that the Model Checker can only verify whether
 128 your application fits the provided properties, which is useless if you
 129 have a bug in your property. Remember also that one way for your
 130 application to never violate a given assert is to not start at all
 131 because of a stupid bug.
 132
 133 Another limit of this mode is that it does not use the performance
 134 models of the simulation mode. Time becomes discrete: You can say for
 135 example that the application took 42 steps to run, but there is no way
 136 to know how much time it took or the amount of watts that were dissipated.
 137
 138 Finally, the model checker only explores the interleavings of
 139 computations and communications. Other factors such as thread
 140 execution interleaving are not considered by the SimGrid model
 141 checker.
 142
 143 The model checker may well miss existing issues, as it computes the
 144 possible outcomes *from a given initial situation*. There is no way to
 145 prove the correction of your application in all generality with this
 146 tool.
 147
 148 **Benchmark Recording Mode**. During debug sessions, continuous
 149 integration testing, and other similar use cases, you are often only
 150 interested in the control flow. If your application apply filters to
 151 huge images split in small blocks, the filtered image is probably not
 152 what you are interested in. You are probably looking for a way to run
 153 each computation kernel only once, save on disk the time it takes and
 154 some other metadata. This code block can then be skipped in simulation
 155 and replaced by a synthetic block using the cached information. The
 156 simulated platform will take this block into account without requesting
 157 the real hosting machine to benchmark it.
 158
 159 SimGrid Limits
 160 --------------
 161
 162 This framework is by no means the perfect holly grail able to solve
 163 every problem on earth.
 164
 165 **SimGrid scope is limited to distributed systems.** Real-time
 166 multi-threaded systems are out of scope. You could probably tweak
 167 SimGrid for such studies (or the framework could possibly be extended
 168 in this direction), but another framework specifically targeting such a
 169 use case would probably be more suited.
 170
 171 **There is currently no support for wireless networks**.
 172 The framework could certainly be improved in this direction, but this
 173 still has to be done.
 174
 175 **There is no perfect model, only models adapted to your study.**
 176 The SimGrid models target fast and large studies yet requesting
 177 realistic results. In particular, our models abstract away parameters
 178 and phenomena that are often irrelevant to the realism in our
 179 context.
 180
 181 SimGrid is simply not intended to any study that would mandate the
 182 abstracted phenomenon. Here are some **studies that you should not do
 183 with SimGrid**:
 184
 185  - Studying the effect of L3 vs. L2 cache effects on your application
 186  - Comparing kernel schedulers and policies
 187  - Comparing variants of TCP
 188  - Exploring pathological cases where TCP breaks down, resulting in
 189    abnormal executions.
 190  - Studying security aspects of your application, in presence of
 191    malicious agents.
 192
 193 SimGrid Success Stories
 194 -----------------------
 195
 196 SimGrid was cited in over 1,500 scientific papers (according to Google
 197 Scholar). Among them
 198 `over 200 publications <https://simgrid.org/Usages.html>`_
 199 (written by about 300 individuals) use SimGrid as a scientific
 200 instrument to conduct their experimental evaluation. These
 201 numbers do not include the articles contributing to SimGrid.
 202 This instrument was used in many research communities, such as
 203 `High-Performance Computing <https://hal.inria.fr/inria-00580599/>`_,
 204 `Cloud Computing <http://dx.doi.org/10.1109/CLOUD.2015.125>`_,
 205 `Workflow Scheduling <http://dl.acm.org/citation.cfm?id=2310096.2310195>`_,
 206 `Big Data <https://hal.inria.fr/hal-01199200/>`_ and
 207 `MapReduce <http://dx.doi.org/10.1109/WSCAD-SSC.2012.18>`_,
 208 `Data Grid <http://ieeexplore.ieee.org/document/7515695/>`_,
 209 `Volunteer Computing <http://www.sciencedirect.com/science/article/pii/S1569190X17301028>`_,
 210 `Peer-to-Peer Computing <https://hal.archives-ouvertes.fr/hal-01152469/>`_,
 211 `Network Architecture <http://dx.doi.org/10.1109/TPDS.2016.2613043>`_,
 212 `Fog Computing <http://ieeexplore.ieee.org/document/7946412/>`_, or
 213 `Batch Scheduling <https://hal.archives-ouvertes.fr/hal-01333471>`_
 214 `(more info) <https://simgrid.org/Usages.html>`_.
 215
 216 If your platform description is accurate enough (see
 217 `here <http://hal.inria.fr/hal-00907887>`_ or
 218 `there <https://hal.inria.fr/hal-01523608>`_),
 219 SimGrid can provide high-quality performance predictions. For example,
 220 we determined the speedup achieved by the Tibidabo ARM-based
 221 cluster before its construction
 222 (`paper <http://hal.inria.fr/hal-00919507>`_). In this case,
 223 some differences between the prediction and the real timings were due to
 224 misconfiguration or other problems with the real platform. To some extent,
 225 SimGrid could even be used to debug the real platform :)
 226
 227 SimGrid is also used to debug, improve, and tune several large
 228 applications.
 229 `BigDFT <http://bigdft.org>`_ (a massively parallel code
 230 computing the electronic structure of chemical elements developped by
 231 the CEA), `StarPU <http://starpu.gforge.inria.fr/>`_ (a
 232 Unified Runtime System for Heterogeneous Multicore Architectures
 233 developped by Inria Bordeaux) and
 234 `TomP2P <https://tomp2p.net/dev/simgrid/>`_ (a high performance
 235 key-value pair storage library developed at University of Zurich).
 236 Some of these applications enjoy large user communities themselves.
 237
 238 ..  LocalWords:  SimGrid
 239