doc/doxygen/getting_started.doc

   1 /*! @page getting_started Getting Started: SimGrid Main Concepts
   2
   3 @tableofcontents
   4
   5 SimGrid is a framework to simulate distributed computer systems.
   6
   7 It can be used to either assess abstract algorithms, or to profile and
   8 debug real distributed applications.  SimGrid enables studies in the
   9 domains of (data-)Grids, IaaS Clouds, Clusters, High Performance
  10 Computing, Volunteer Computing and Peer-to-Peer systems.
  11
  12 Technically speaking, SimGrid is a library. It is neither a graphical
  13 interface nor a command-line simulator running user scripts. The
  14 interaction with SimGrid is done by writing programs with the exposed
  15 functions to build your own simulator.
  16
  17 SimGrid offers many features, many options and many possibilities. The
  18 documentation aims at smoothing the learning curve. But nothing's
  19 perfect, and this documentation is really no exception here. Please
  20 help us improving it by reporting any issue that you see and
  21 proposing the content that is still missing.
  22
  23 SimGrid is a Free Software distributed under the LGPL licence. You are
  24 thus welcome to use it as you wish, or even to modify and distribute
  25 your version (as long as your version is as free as ours). It also
  26 means that SimGrid is developed by a vivid community of users and
  27 developers. We hope that you will come and join us!
  28
  29 SimGrid is the result of over 15 years of research from several
  30 groups, both in France and in the USA. It benefited of many funding
  31 from various research instances, including the ANR, Inria, CNRS,
  32 University of Lorraine, University of Hawai'i at Manoa, ENS Rennes and
  33 many others. Many thanks to our generous sponsors!
  34
  35 @section starting_components Typical Study based on SimGrid
  36
  37 Any SimGrid study entails the following components:
  38
  39  - The studied **Application**. This can be either a distributed
  40    algorithm described in our simple APIs, or a full featured real
  41    parallel application using for example the MPI interface
  42    @ref application "(more info)".
  43
  44  - The **Virtual Platform**. This is a description of a given
  45    distributed system (machines, links, disks, clusters, etc). Most of
  46    the platform files are written in XML althrough a Lua interface is
  47    under development.  SimGrid makes it easy to augment the Virtual
  48    Platform with a Dynamic Scenario where for example the links are
  49    slowed down (because of external usage), the machines fail. You
  50    have even support to specify the applicative workload that you want
  51    to feed to your application  @ref platform "(more info)".
  52
  53  - The application's **Deployment Description**. In SimGrid
  54    terminology, the application is an inert set of source files and
  55    binaries. To make it run, you have to describe how your application
  56    should be deployed on the virtual platform. You need to specify
  57    which process is mapped on which host, along with their parameters
  58    @ref deployment "(more info)".
  59
  60  - The **Platform Models**. They describe how the virtual platform
  61    reacts to the actions of the application. For example, they compute
  62    the time taken by a given communication on the virtual platform.
  63    These models are already included in SimGrid, and you only need to
  64    pick one and maybe tweak its configuration to get your results
  65    @ref models "(more info)".
  66
  67 These components are put together to run a **simulation**, that is an
  68 experiment or a probe. The result of one or many simulation provides
  69 an **outcome** (logs, visualization, statistical analysis) that help
  70 answering the **question** targeted by this study.
  71
  72 The questions that SimGrid can solve include the following:
  73
  74  - **Compare an Application to another**. This is the classical use
  75    case for scientists, who use SimGrid to test how the solution that
  76    they contribute compares to the existing solutions from the
  77    literature.
  78
  79  - **Design the best Virtual Platform for a given Application.**
  80    Tweaking the platform file is much easier than building a new real
  81    platform for testing purpose. SimGrid also allows co-design of the
  82    platform and the application by modifying both of them.
  83
  84  - **Debug Real Applications**. With real systems, is sometimes
  85    difficult to reproduce the exact run leading to the bug that you
  86    are tracking. SimGrid gives you experimental reproducibility,
  87    clairevoyance (you can explore every part of the system, and your
  88    probe will not change the simulated state). It also makes it easy
  89    to mock some parts of the real system that are not under study.
  90
  91 Depending on the context, you may see some parts of this process as
  92 less important, but you should pay close attention if you want to be
  93 confident in the results coming out of your simulations. In
  94 particular, you should not trust blindly your results but always
  95 strive to double-check them. Likewise, you should question the realism
  96 of your input configuration, and we even encourage you to doubt (and
  97 check) the provided performance models.
  98
  99 To ease such questionning, you really should logically separate these
 100 parts in your experimental setup. It is seen as a very bad practice to
 101 merge the application, the platform and the deployment all together.
 102 SimGrid is versatile and your milleage may vary, but you should start
 103 with your Application specified as a C++ or Java program, using one of
 104 the provided XML platform file, and with your deployment in a separate
 105 XML file.
 106
 107 @section starting_gears SimGrid Execution Gears
 108
 109 Depending on the intended study, SimGrid can be run in several gears,
 110 that are different execution modes.
 111
 112 ** **Simulation Gear**. This is the most common gear, where you want
 113 to study how your application behaves on the virtual platform under
 114 the experimental scenario.
 115
 116 In this gear, SimGrid can provide information about the time taken by
 117 your application, the amount of energy dissipated by the platform to
 118 run your application and the detailed usage of each resource.
 119
 120 ** **Model-Checking Gear**. This can be seen as a sort of exhaustive
 121 testing gear, where every possible outcome of your application is
 122 explored. In some sense, this gear tests your application for all
 123 possible platforms that you could imagine (and more).
 124
 125 You just provide the application and its deployment (amount of
 126 processes and parameters), and the model-checker will litterally
 127 explore all possible outcomes by testing all possible message
 128 interleavings: if at some point a given process can either receive the
 129 message A first or the message B depending on the platform
 130 characteristics, the model-checker will explore the scenario where A
 131 arrives first, and then rewind to the same point to explore the
 132 scenario where B arrives first.
 133
 134 This is a very powerful gear, where you can evaluate the correction of
 135 your application. It can verify either **safety properties** (asserts)
 136 or **liveless properties** stating for example that if a given event
 137 occures, then another given event will occur in a finite amount of
 138 steps. This gear is not only usable with the abstract algorithms
 139 developed on top of the SimGrid APIs, but also with real MPI
 140 applications (to some extend).
 141
 142 The main limit of Model Checking lays in the huge amount of scenarios
 143 to explore. SimGrid tries to explore only non-redundent scenarios
 144 thanks to classical reduction techniques (such as DPOR and stateful
 145 exploration) but the exploration may well never finish if you don't
 146 carefully adapt your application to this gear.
 147
 148 A classical trap is that the Model Checker can only verify whether
 149 your application fits the provided properties, which is useless if you
 150 have a bug in your property. Remember also that one way for your
 151 application to never violate a given assert is to not start at all
 152 because of a stupid bug.
 153
 154 Another limit of this gear is that it does not use the performance
 155 models of the simulation gear. Time becomes discrete: You can say for
 156 example that the application took 42 steps to run, but there is no way
 157 to know the amount of seconds that it took or the amount of watts that
 158 it dissipated.
 159
 160 Finally, the model checker only explores the interleavings of
 161 computations and communications. Other factors such as thread
 162 execution interleaving are not considered by the SimGrid model
 163 checker.
 164
 165 The model checker may well miss existing issues, as it computes the
 166 possible outcomes *from a given initial situation*. There is no way to
 167 prove the correction of your application in all generality with this
 168 tool.
 169
 170 ** **Benchmark Recording Gear**. During debug sessions, continuous
 171 integration testing and other similar use cases, you are often only
 172 interested in the control flow. If your application apply filters to
 173 huge images split in small blocks, the filtered image is probably not
 174 what you are interested in. You are probably looking for a way to run
 175 each computation kernel only once, save on disk the time it takes and
 176 some other metadata. This code block can then be skipped in simulation
 177 and replaced by a synthetic block using the cached information. The
 178 virtual platform will take this block into account without requesting
 179 the real hosting machine to benchmark it.
 180
 181 @section starting_successes SimGrid Success Stories
 182
 183 TBD
 184
 185 - Many publications
 186 - Accurate speedup prediction for the Mont-Blanc cluster
 187 - It already happened that a divergence between the simulated outcome
 188   and the reality resulted from a testbed misconfiguration. In some
 189   sense, we fixed the reality because it was not getting the result
 190   that SimGrid correctly computed :)
 191 - Star-PU, BigDFT, TomP2P use SimGrid to chase their bugs and improve
 192   their efficiency.
 193
 194 @section starting_limits SimGrid Limits
 195
 196 This framework is by no means the perfect holly grail able to solve
 197 every problem on earth.
 198
 199 ** **SimGrid scope is limited to distributed systems.** Real-time
 200 multithreaded systems are not in the scope. You could probably tweak
 201 SimGrid for such studies (or the framework could possibily be extended
 202 in this direction), but another framework specifically targeting this
 203 usecase would probably be more suited.
 204
 205 ** **There is currently no support for IoT studies and wireless networks**.
 206 The framework could certainly be improved in this direction, but this
 207 is still to be done.
 208
 209 ** **There is no perfect model, only models adapted to your study.**
 210 The SimGrid models target fast, large studies yet requesting a
 211 realistic results. In particular, our models abstract away parameters
 212 and phenomenon that are often irrelevant to the realism in our
 213 context.
 214
 215 SimGrid is simply not intended to any study that would mandate the
 216 abstracted phenomenon. Here are some **studies that you should not do
 217 with SimGrid**:
 218
 219  - Studying the effect of L3 vs L2 cache effects on your application
 220  - Comparing variantes of TCP
 221  - Exploring pathological cases where TCP breaks down, resulting in
 222    abnormal executions.
 223  - Studying security aspects of your application, in presence of
 224    malicious agents.
 225
 226 @section starting_next Where to proceed next?
 227
 228 Now that you know about the basic concepts of SimGrid, you can give it
 229 a try. If it's not done yet, first @ref install "install it". Then,
 230 proceed to the section on @ref application "describing the application" that
 231 you want to study.
 232
 233 */