8 <object data="graphical-toc.svg" type="image/svg+xml"></object>
15 Typical Study based on SimGrid
16 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18 Any SimGrid study entails the following components:
20 - The studied **application**. This can be either a distributed
21 algorithm described in our simple APIs or a full-featured real
22 parallel application using for example the MPI interface
23 :ref:`(more info) <application>`.
25 - The **simulated platform**. This is a description of a given
26 distributed system (machines, links, disks, clusters, etc). Most of
27 the platform files are written in XML although a Lua interface is
28 under development. SimGrid makes it easy to augment the Simulated
29 Platform with a Dynamic Scenario where for example the links are
30 slowed down (because of external usage) or the machines fail. You
31 even have support to specify the applicative workload that you want
32 to feed to your application
33 :ref:`(more info) <platform>`.
35 - The application's **deployment description**. In SimGrid
36 terminology, the application is an inert set of source files and
37 binaries. To make it run, you have to describe how your application
38 should be deployed on the simulated platform. You need to specify
39 which process is mapped onto which machine, along with their parameters
40 :ref:`(more info) <scenario>`.
42 - The **platform models**. They describe how the simulated platform
43 reacts to the actions of the application. For example, they compute
44 the time taken by a given communication on the simulated platform.
45 These models are already included in SimGrid, and you only need to
46 pick one and maybe tweak its configuration to get your results
47 :ref:`(more info) <models>`.
49 These components are put together to run a **simulation**, that is an
50 experiment or a probe. Simulations produce **outcomes** (logs,
51 visualization, or statistical analysis) that help to answer the
52 **question** targeted by this study.
54 Here are some questions on which SimGrid is particularly relevant:
56 - **Compare an Application to another**. This is the classical use
57 case for scientists, who use SimGrid to test how the solution that
58 they contribute to compares to the existing solutions from the
61 - **Design the best [Simulated] Platform for a given Application.**
62 Tweaking the platform file is much easier than building a new real
63 platform for testing purposes. SimGrid also allows for the co-design
64 of the platform and the application by modifying both of them.
66 - **Debug Real Applications**. With real systems, is sometimes
67 difficult to reproduce the exact run leading to the bug that you
68 are tracking. With SimGrid, you are *clairvoyant* about your
69 *reproducible experiments*: you can explore every part of the
70 system, and your probe will not change the simulated state. It also
71 makes it easy to mock some parts of the real system that are not
74 Depending on the context, you may see some parts of this process as
75 less important, but you should pay close attention if you want to be
76 confident in the results coming out of your simulations. In
77 particular, you should not blindly trust your results but always
78 strive to double-check them. Likewise, you should question the realism
79 of your input configuration, and we even encourage you to doubt (and
80 check) the provided performance models.
82 To ease such questioning, you really should logically separate these
83 parts in your experimental setup. It is seen as a very bad practice to
84 merge the application, the platform, and the deployment altogether.
85 SimGrid is versatile and your mileage may vary, but you should start
86 with your Application specified as a C++ or Java program, using one of
87 the provided XML platform files, and with your deployment in a separate
90 SimGrid Execution Modes
91 ^^^^^^^^^^^^^^^^^^^^^^^
93 Depending on the intended study, SimGrid can be run in several execution modes.
95 **Simulation Mode**. This is the most common execution mode, where you want
96 to study how your application behaves on the simulated platform under
97 the experimental scenario.
99 In this mode, SimGrid can provide information about the time taken by
100 your application, the amount of energy dissipated by the platform to
101 run your application, and the detailed usage of each resource.
103 **Model-Checking Mode**. This can be seen as a sort of exhaustive
104 testing mode, where every possible outcome of your application is
105 explored. In some sense, this mode tests your application for all
106 possible platforms that you could imagine (and more).
108 You just provide the application and its deployment (number of
109 processes and parameters), and the model checker will
110 explore all possible outcomes by testing all possible message
111 interleavings: if at some point a given process can either receive the
112 message A first or the message B depending on the platform
113 characteristics, the model checker will explore the scenario where A
114 arrives first, and then rewind to the same point to explore the
115 scenario where B arrives first.
117 This is a very powerful mode, where you can evaluate the correctness of
118 your application. It can verify either **safety properties** (assertions)
119 or **liveness properties** stating for example that if a given event
120 occurs, then another given event will occur in a finite amount of
121 steps. This mode is not only usable with the abstract algorithms
122 developed on top of the SimGrid APIs, but also with real MPI
123 applications (to some extent).
125 The main limit of Model Checking lies in the huge amount of scenarios
126 to explore. SimGrid tries to explore only non-redundant scenarios
127 thanks to classical reduction techniques (such as DPOR and stateful
128 exploration) but the exploration may well never finish if you don't
129 carefully adapt your application to this mode.
131 A classical trap is that the Model Checker can only verify whether
132 your application fits the properties provided, which is useless if you
133 have a bug in your property. Remember also that one way for your
134 application to never violate a given assertion is to not start at all,
135 because of a stupid bug.
137 Another limit of this mode is that it does not use the performance
138 models of the simulation mode. Time becomes discrete: You can say for
139 example that the application took 42 steps to run, but there is no way
140 to know how much time it took or the number of watts that were dissipated.
142 Finally, the model checker only explores the interleavings of
143 computations and communications. Other factors such as thread
144 execution interleaving are not considered by the SimGrid model
147 The model checker may well miss existing issues, as it computes the
148 possible outcomes *from a given initial situation*. There is no way to
149 prove the correctness of your application in full generality with this
152 **Benchmark Recording Mode**. During debug sessions, continuous
153 integration testing, and other similar use cases, you are often only
154 interested in the control flow. If your application applies filters to
155 huge images split into small blocks, the filtered image is probably not
156 what you are interested in. You are probably looking for a way to run
157 each computational kernel only once, and record the time it takes to cache it.
158 This code block can then be skipped in simulation
159 and replaced by a synthetic block using the cached information. The
160 simulated platform will take this block into account without requesting
161 the actual hosting machine to benchmark it.
166 This framework is by no means the holy grail, able to solve
167 every problem on Earth.
169 **SimGrid scope is limited to distributed systems.** Real-time
170 multi-threaded systems are out of this scope. You could probably tweak
171 SimGrid for such studies (or the framework could be extended
172 in this direction), but another framework specifically targeting such a
173 use case would probably be more suited.
175 **There is currently no support for 5G or LoRa networks**.
176 The framework could certainly be improved in this direction, but this
177 still has to be done.
179 **There is no perfect model, only models adapted to your study.** The SimGrid
180 models target fast and large studies, and yet they target realistic results. In
181 particular, our models abstract away parameters and phenomena that are often
182 irrelevant to reality in our context.
184 SimGrid is obviously not intended for a study of any phenomenon that our
185 abstraction removes. Here are some **studies that you should not do with
188 - Studying the effect of L3 vs. L2 cache effects on your application
189 - Comparing kernel schedulers and policies
190 - Comparing variants of TCP
191 - Exploring pathological cases where TCP breaks down, resulting in
193 - Studying security aspects of your application, in presence of
196 SimGrid Success Stories
197 ^^^^^^^^^^^^^^^^^^^^^^^
199 SimGrid was cited in over 3,000 scientific papers (according to Google
200 Scholar). Among them,
201 `over 500 publications <https://simgrid.org/Usages.html>`_
202 (written by hundreds of individuals) use SimGrid as a scientific
203 instrument to conduct their experimental evaluation. These
204 numbers do not include the articles contributing to SimGrid.
205 This instrument was used in many research communities, such as
206 `High-Performance Computing <https://hal.inria.fr/inria-00580599/>`_,
207 `Cloud Computing <http://dx.doi.org/10.1109/CLOUD.2015.125>`_,
208 `Workflow Scheduling <http://dl.acm.org/citation.cfm?id=2310096.2310195>`_,
209 `Big Data <https://hal.inria.fr/hal-01199200/>`_ and
210 `MapReduce <http://dx.doi.org/10.1109/WSCAD-SSC.2012.18>`_,
211 `Data Grid <http://ieeexplore.ieee.org/document/7515695/>`_,
212 `Volunteer Computing <http://www.sciencedirect.com/science/article/pii/S1569190X17301028>`_,
213 `Peer-to-Peer Computing <https://hal.archives-ouvertes.fr/hal-01152469/>`_,
214 `Network Architecture <http://dx.doi.org/10.1109/TPDS.2016.2613043>`_,
215 `Fog Computing <http://ieeexplore.ieee.org/document/7946412/>`_, or
216 `Batch Scheduling <https://hal.archives-ouvertes.fr/hal-01333471>`_
217 `(more info) <https://simgrid.org/Usages.html>`_.
219 If your platform description is accurate enough (see
220 `here <http://hal.inria.fr/hal-00907887>`_ or
221 `there <https://hal.inria.fr/hal-01523608>`_),
222 SimGrid can provide high-quality performance predictions. For example,
223 we determined the speedup achieved by the Tibidabo ARM-based
224 cluster before its construction
225 (`paper <http://hal.inria.fr/hal-00919507>`_). In this case,
226 some differences between the prediction and the real timings were due to
227 misconfigurations with the real platform. To some extent,
228 SimGrid could even be used to debug the real platform :)
230 SimGrid is also used to debug, improve, and tune several large
232 `BigDFT <http://bigdft.org>`_ (a massively parallel code
233 computing the electronic structure of chemical elements developed by
234 the CEA), `StarPU <http://starpu.gforge.inria.fr/>`_ (a
235 Unified Runtime System for Heterogeneous Multicore Architectures
236 developed by Inria Bordeaux), and
237 `TomP2P <https://tomp2p.net/dev/simgrid/>`_ (a high-performance
238 key-value pair storage library developed at the University of Zurich).
239 Some of these applications enjoy large user communities themselves.
241 .. LocalWords: SimGrid