docs/source/Introduction.rst

   1 .. _intro_concepts:
   2
   3 Introduction
   4 ============
   5
   6 .. raw:: html
   7
   8    <object data="graphical-toc.svg" type="image/svg+xml"></object>
   9    <br/>
  10    <br/>
  11
  12 Main Concepts
  13 -------------
  14
  15 Typical Study based on SimGrid
  16 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  17
  18 Any SimGrid study entails the following components:
  19
  20  - The studied **application**. This can be either a distributed
  21    algorithm described in our simple APIs or a full-featured real
  22    parallel application using for example the MPI interface
  23    :ref:`(more info) <application>`.
  24
  25  - The **simulated platform**. This is a description of a given
  26    distributed system (machines, links, disks, clusters, etc). Most of
  27    the platform files are written in XML although a Lua interface is
  28    under development.  SimGrid makes it easy to augment the Simulated
  29    Platform with a Dynamic Scenario where for example the links are
  30    slowed down (because of external usage) or the machines fail. You
  31    even have support to specify the applicative workload that you want
  32    to feed to your application
  33    :ref:`(more info) <platform>`.
  34
  35  - The application's **deployment description**. In SimGrid
  36    terminology, the application is an inert set of source files and
  37    binaries. To make it run, you have to describe how your application
  38    should be deployed on the simulated platform. You need to specify
  39    which process is mapped onto which machine, along with their parameters
  40    :ref:`(more info) <scenario>`.
  41
  42  - The **platform models**. They describe how the simulated platform
  43    reacts to the actions of the application. For example, they compute
  44    the time taken by a given communication on the simulated platform.
  45    These models are already included in SimGrid, and you only need to
  46    pick one and maybe tweak its configuration to get your results
  47    :ref:`(more info) <models>`.
  48
  49 These components are put together to run a **simulation**, that is an
  50 experiment or a probe. Simulations produce **outcomes** (logs,
  51 visualization, or statistical analysis) that help to answer the
  52 **question** targeted by this study.
  53
  54 Here are some questions on which SimGrid is particularly relevant:
  55
  56  - **Compare an Application to another**. This is the classical use
  57    case for scientists, who use SimGrid to test how the solution that
  58    they contribute to compares to the existing solutions from the
  59    literature.
  60
  61  - **Design the best [Simulated] Platform for a given Application.**
  62    Tweaking the platform file is much easier than building a new real
  63    platform for testing purposes. SimGrid also allows for the co-design
  64    of the platform and the application by modifying both of them.
  65
  66  - **Debug Real Applications**. With real systems, is sometimes
  67    difficult to reproduce the exact run leading to the bug that you
  68    are tracking. With SimGrid, you are *clairvoyant* about your
  69    *reproducible experiments*: you can explore every part of the
  70    system, and your probe will not change the simulated state. It also
  71    makes it easy to mock some parts of the real system that are not
  72    under study.
  73
  74 Depending on the context, you may see some parts of this process as
  75 less important, but you should pay close attention if you want to be
  76 confident in the results coming out of your simulations. In
  77 particular, you should not blindly trust your results but always
  78 strive to double-check them. Likewise, you should question the realism
  79 of your input configuration, and we even encourage you to doubt (and
  80 check) the provided performance models.
  81
  82 To ease such questioning, you really should logically separate these
  83 parts in your experimental setup. It is seen as a very bad practice to
  84 merge the application, the platform, and the deployment altogether.
  85 SimGrid is versatile and your mileage may vary, but you should start
  86 with your Application specified as a C++ or Java program, using one of
  87 the provided XML platform files, and with your deployment in a separate
  88 XML file.
  89
  90 SimGrid Execution Modes
  91 ^^^^^^^^^^^^^^^^^^^^^^^
  92
  93 Depending on the intended study, SimGrid can be run in several execution modes.
  94
  95 **Simulation Mode**. This is the most common execution mode, where you want
  96 to study how your application behaves on the simulated platform under
  97 the experimental scenario.
  98
  99 In this mode, SimGrid can provide information about the time taken by
 100 your application, the amount of energy dissipated by the platform to
 101 run your application, and the detailed usage of each resource.
 102
 103 **Model-Checking Mode**. This can be seen as a sort of exhaustive
 104 testing mode, where every possible outcome of your application is
 105 explored. In some sense, this mode tests your application for all
 106 possible platforms that you could imagine (and more).
 107
 108 You just provide the application and its deployment (number of
 109 processes and parameters), and the model checker will
 110 explore all possible outcomes by testing all possible message
 111 interleavings: if at some point a given process can either receive the
 112 message A first or the message B depending on the platform
 113 characteristics, the model checker will explore the scenario where A
 114 arrives first, and then rewind to the same point to explore the
 115 scenario where B arrives first.
 116
 117 This is a very powerful mode, where you can evaluate the correctness of
 118 your application. It can verify either **safety properties** (assertions)
 119 or **liveness properties** stating for example that if a given event
 120 occurs, then another given event will occur in a finite amount of
 121 steps. This mode is not only usable with the abstract algorithms
 122 developed on top of the SimGrid APIs, but also with real MPI
 123 applications (to some extent).
 124
 125 The main limit of Model Checking lies in the huge amount of scenarios
 126 to explore. SimGrid tries to explore only non-redundant scenarios
 127 thanks to classical reduction techniques (such as DPOR and stateful
 128 exploration) but the exploration may well never finish if you don't
 129 carefully adapt your application to this mode.
 130
 131 A classical trap is that the Model Checker can only verify whether
 132 your application fits the properties provided, which is useless if you
 133 have a bug in your property. Remember also that one way for your
 134 application to never violate a given assertion is to not start at all,
 135 because of a stupid bug.
 136
 137 Another limit of this mode is that it does not use the performance
 138 models of the simulation mode. Time becomes discrete: You can say for
 139 example that the application took 42 steps to run, but there is no way
 140 to know how much time it took or the number of watts that were dissipated.
 141
 142 Finally, the model checker only explores the interleavings of
 143 computations and communications. Other factors such as thread
 144 execution interleaving are not considered by the SimGrid model
 145 checker.
 146
 147 The model checker may well miss existing issues, as it computes the
 148 possible outcomes *from a given initial situation*. There is no way to
 149 prove the correctness of your application in full generality with this
 150 tool.
 151
 152 **Benchmark Recording Mode**. During debug sessions, continuous
 153 integration testing, and other similar use cases, you are often only
 154 interested in the control flow. If your application applies filters to
 155 huge images split into small blocks, the filtered image is probably not
 156 what you are interested in. You are probably looking for a way to run
 157 each computational kernel only once, and record the time it takes to cache it.
 158 This code block can then be skipped in simulation
 159 and replaced by a synthetic block using the cached information. The
 160 simulated platform will take this block into account without requesting
 161 the actual hosting machine to benchmark it.
 162
 163 SimGrid Limits
 164 ^^^^^^^^^^^^^^
 165
 166 This framework is by no means the holy grail, able to solve
 167 every problem on Earth.
 168
 169 **SimGrid scope is limited to distributed systems.** Real-time
 170 multi-threaded systems are out of this scope. You could probably tweak
 171 SimGrid for such studies (or the framework could be extended
 172 in this direction), but another framework specifically targeting such a
 173 use case would probably be more suited.
 174
 175 **There is currently no support for 5G or LoRa networks**.
 176 The framework could certainly be improved in this direction, but this
 177 still has to be done.
 178
 179 **There is no perfect model, only models adapted to your study.** The SimGrid
 180 models target fast and large studies, and yet they target realistic results. In
 181 particular, our models abstract away parameters and phenomena that are often
 182 irrelevant to reality in our context.
 183
 184 SimGrid is obviously not intended for a study of any phenomenon that our
 185 abstraction removes. Here are some **studies that you should not do with
 186 SimGrid**:
 187
 188  - Studying the effect of L3 vs. L2 cache effects on your application
 189  - Comparing kernel schedulers and policies
 190  - Comparing variants of TCP
 191  - Exploring pathological cases where TCP breaks down, resulting in
 192    abnormal executions.
 193  - Studying security aspects of your application, in presence of
 194    malicious agents.
 195
 196 SimGrid Success Stories
 197 ^^^^^^^^^^^^^^^^^^^^^^^
 198
 199 SimGrid was cited in over 3,000 scientific papers (according to Google
 200 Scholar). Among them,
 201 `over 500 publications <https://simgrid.org/Usages.html>`_
 202 (written by hundreds of individuals) use SimGrid as a scientific
 203 instrument to conduct their experimental evaluation. These
 204 numbers do not include the articles contributing to SimGrid.
 205 This instrument was used in many research communities, such as
 206 `High-Performance Computing <https://hal.inria.fr/inria-00580599/>`_,
 207 `Cloud Computing <http://dx.doi.org/10.1109/CLOUD.2015.125>`_,
 208 `Workflow Scheduling <http://dl.acm.org/citation.cfm?id=2310096.2310195>`_,
 209 `Big Data <https://hal.inria.fr/hal-01199200/>`_ and
 210 `MapReduce <http://dx.doi.org/10.1109/WSCAD-SSC.2012.18>`_,
 211 `Data Grid <http://ieeexplore.ieee.org/document/7515695/>`_,
 212 `Volunteer Computing <http://www.sciencedirect.com/science/article/pii/S1569190X17301028>`_,
 213 `Peer-to-Peer Computing <https://hal.archives-ouvertes.fr/hal-01152469/>`_,
 214 `Network Architecture <http://dx.doi.org/10.1109/TPDS.2016.2613043>`_,
 215 `Fog Computing <http://ieeexplore.ieee.org/document/7946412/>`_, or
 216 `Batch Scheduling <https://hal.archives-ouvertes.fr/hal-01333471>`_
 217 `(more info) <https://simgrid.org/Usages.html>`_.
 218
 219 If your platform description is accurate enough (see
 220 `here <http://hal.inria.fr/hal-00907887>`_ or
 221 `there <https://hal.inria.fr/hal-01523608>`_),
 222 SimGrid can provide high-quality performance predictions. For example,
 223 we determined the speedup achieved by the Tibidabo ARM-based
 224 cluster before its construction
 225 (`paper <http://hal.inria.fr/hal-00919507>`_). In this case,
 226 some differences between the prediction and the real timings were due to
 227 misconfigurations with the real platform. To some extent,
 228 SimGrid could even be used to debug the real platform :)
 229
 230 SimGrid is also used to debug, improve, and tune several large
 231 applications.
 232 `BigDFT <http://bigdft.org>`_ (a massively parallel code
 233 computing the electronic structure of chemical elements developed by
 234 the CEA), `StarPU <http://starpu.gforge.inria.fr/>`_ (a
 235 Unified Runtime System for Heterogeneous Multicore Architectures
 236 developed by Inria Bordeaux), and
 237 `TomP2P <https://tomp2p.net/dev/simgrid/>`_ (a high-performance
 238 key-value pair storage library developed at the University of Zurich).
 239 Some of these applications enjoy large user communities themselves.
 240
 241 ..  LocalWords:  SimGrid
 242