doc/doxygen/introduction.doc

   1 /*! @page introduction Introduction to SimGrid
   2
   3 [SimGrid](http://simgrid.gforge.inria.fr/) is a toolkit
   4 that provides core functionalities for the simulation of distributed
   5 applications in heterogeneous distributed environments.
   6
   7 The specific goal of the project is to facilitate research in the area of
   8 distributed and parallel application scheduling on distributed computing
   9 platforms ranging from simple network of workstations to Computational
  10 Grids.
  11
  12 # Scenario
  13 The goal of this practical session is to illustrate various usage of
  14 the MSG interface. To this end we will use the following simple setting:
  15
  16 > Assume we have a (possibly large) bunch of (possibly large) data to
  17 > process and which originally reside on a server (a.k.a. master). For
  18 > sake of simplicity, we assume all input file require the same amount
  19 > of computation. We assume the server can be helped by a (possibly
  20 > large) set of worker machines. What is the best way to organize the
  21 > computations ?
  22
  23 Although this looks like a very simple setting it raises several
  24 interesting questions:
  25
  26 - Which algorithm should the master use to send workload?
  27
  28     The most obvious algorithm would be to send tasks to workers in a
  29     round-robin fashion. This is the initial code we provide you.
  30
  31     A less obvious but probably more efficient approach would be to set up
  32     a request mechanism where a client first ask for tasks, which allows
  33     the server to decide which request to answer and possibly to send
  34     the tasks to the fastest machines. Maybe you can think of a
  35     smarter mechanism...
  36
  37 - How many tasks should the client ask for?
  38
  39     Indeed, if we set up a request mechanism so that workers only
  40     send request whenever they have no more task to process, they are
  41     likely to be poorly exploited since they will have to wait for the
  42     master to consider their request and for the input data to be
  43     transferred. A client should thus probably request a pool of tasks
  44     but if it requests too many tasks, it is likely to lead to a poor
  45     load-balancing...
  46
  47 - How is the quality of such algorithm dependent on the platform
  48     characteristics and on the task characteristics?
  49
  50     Whenever the input communication time is very small compared to
  51     processing time and workers are homogeneous, it is likely that the
  52     round-robin algorithm performs very well. Would it still hold true
  53     when transfer time is not negligible and the platform is, say,
  54     a volunteer computing system ?
  55
  56 - The network topology interconnecting the master and the workers
  57   may be quite complicated. How does such a topology impact the
  58   previous result?
  59
  60     When data transfers are the bottleneck, it is likely that a good
  61     modeling of the platform becomes essential. In this case, you may
  62     want to be able to account for complex platform topologies.
  63
  64 - Do the algorithms depend on a perfect knowledge of this
  65   topology?
  66
  67     Should we still use a flat master worker deployment or should we
  68     use a
  69
  70 - How is such an algorithm sensitive to external workload variation?
  71
  72     What if bandwidth, latency and power can vary with no warning?
  73     Shouldn't you study whether your algorithm is sensitive to such
  74     load variations?
  75
  76 - Although an algorithm may be more efficient than another, how
  77   does it interfere with other applications?
  78
  79     %As you can see, this very simple setting may need to evolve way
  80     beyond what you initially imagined.
  81
  82     <blockquote> Premature optimization is  the root of all evil. -- D.E.Knuth</blockquote>
  83
  84     Furthermore, writing your own simulator is much harder than you
  85     may imagine. This is why you should rely on an established and flexible
  86     one.
  87
  88 The following figure is a screenshot of [triva][fn:1] visualizing a [SimGrid
  89 simulation][fn:2] of two master worker applications (one in light gray and
  90 the other in dark gray) running in concurrence and showing resource
  91 usage over a long period of time.
  92
  93 ![Test](./sc3-description.png)
  94
  95 # Prerequisites
  96
  97 ## Tutorials
  98
  99 A lot of information on how to install and use Simgrid are
 100 provided by the [online documentation][fn:4] and by several tutorials:
 101
 102 - http://simgrid.gforge.inria.fr/tutorials/simgrid-use-101.pdf
 103 - http://simgrid.gforge.inria.fr/tutorials/simgrid-tracing-101.pdf
 104 - http://simgrid.gforge.inria.fr/tutorials/simgrid-platf-101.pdf
 105
 106 ## Installing SimGrid
 107
 108 In case you're using [Debian](https://www.debian.org) or a derivate,
 109 such as Ubuntu or Mint, you can install SimGrid by using the provided
 110 packages:
 111
 112     sudo apt-get install libsimgrid-dev
 113
 114 This tutorial requires simgrid 3.8 at least so you may need to get
 115 the [debian packages](http://packages.debian.org/libsimgrid-dev).
 116
 117 Please note that your distribution may ship with an old version of
 118 SimGrid; you may want to use [a newer release](https://gforge.inria.fr/frs/?group_id=12)
 119 or even [clone our git repository](https://gforge.inria.fr/frs/?group_id=12)
 120 (a [GitHub mirror](https://github.com/mquinson/simgrid) is also available).
 121
 122 # Recommended Steps
 123
 124 ## Installing Viva
 125
 126 This [software][fn:1] will be useful to make fancy graph or treemap
 127 visualizations and get a better understanding of simulations. You
 128 will first need to install pajeng:
 129
 130 ~~~~{.sh}
 131 sudo apt-get install git cmake build-essential libqt4-dev  libboost-dev freeglut3-dev ;
 132 git clone https://github.com/schnorr/pajeng.git
 133 cd pajeng && mkdir -p build &&  cd build && cmake ../ -DCMAKE_INSTALL_PREFIX=$HOME &&  make -j install
 134 cd ../../
 135 ~~~~
 136
 137 Then you can install viva.
 138
 139 ~~~~{.sh}
 140 sudo apt-get install libboost-dev libconfig++-dev libconfig8-dev libgtk2.0-dev freeglut3-dev
 141 git clone https://github.com/schnorr/viva.git
 142 cd viva && mkdir -p build_graph &&  cd build_graph && cmake ../ -DTUPI_LIBRARY=ON -DVIVA=ON -DCMAKE_INSTALL_PREFIX=$HOME &&  make -j install
 143 cd ../../
 144 ~~~~
 145
 146 ## Installing Paje
 147
 148 This [software][fn:5] provides a Gantt-chart visualization.
 149
 150 ~~~~{.sh}
 151 sudo apt-get install paje.app
 152 ~~~~
 153
 154 ## Installing Vite
 155
 156 This software provides a [Gantt-chart visualization][fn:6].
 157
 158 ~~~~{.sh}
 159 sudo apt-get install vite
 160 ~~~~
 161
 162 # Let's get Started
 163 ## Setting up and Compiling
 164
 165 The corresponding archive with all source files and platform files
 166 can be obtained [here](http://simgrid.gforge.inria.fr/tutorials/msg-tuto/msg-tuto.tgz).
 167
 168 ~~~~{.sh}
 169 tar zxf msg-tuto.tgz
 170 cd msg-tuto/src
 171 make
 172 ~~~~
 173
 174 %As you can see, there is already a nice Makefile that compiles
 175 everything for you. Now the tiny example has been compiled and it
 176 can be easily run as follows:
 177
 178 ~~~~{.sh}
 179 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1
 180 ~~~~
 181
 182 If you create a single self-content C-file named foo.c, the
 183 corresponding program will be simply compiled and linked with
 184 SimGrid by typing:
 185
 186 ~~~~{.sh}
 187 make foo
 188 ~~~~
 189
 190 For a more "fancy" output, you can try:
 191
 192 ~~~~{.sh}
 193 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1 | simgrid-colorizer
 194 ~~~~
 195
 196 For a really fancy output, you should use [viva/triva][fn:1]:
 197
 198 ~~~~{.sh}
 199 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
 200     --cfg=tracing/uncategorized:yes --cfg=viva/uncategorized:uncat.plist
 201 LANG=C ; viva simgrid.trace uncat.plist
 202 ~~~~
 203
 204 For a more classical Gantt-Chart visualization, you can produce a
 205 [Paje][fn:5] trace:
 206
 207 ~~~~{.sh}
 208 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
 209     --cfg=tracing/msg/process:yes
 210 LANG=C ; Paje simgrid.trace
 211 ~~~~
 212
 213 Alternatively, you can use [vite][fn:6].
 214
 215 ~~~~{.sh}
 216 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
 217     --cfg=tracing/msg/process:yes --cfg=tracing/basic:yes
 218 vite simgrid.trace
 219 ~~~~
 220
 221 ## Getting Rid of Workers in the Deployment File
 222
 223 In the previous example, the deployment file `deployment0.xml`
 224 is tightly connected to the platform file `platform.xml` and a
 225 worker process is launched on each host:
 226
 227 ~~~~{.xml}
 228 <?xml version='1.0'?>
 229 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
 230 <platform version="3">
 231   <!-- The master process (with some arguments) -->
 232   <process host="Tremblay" function="master">
 233      <argument value="20"/>       <!-- Number of tasks -->
 234      <argument value="50000000"/>  <!-- Computation size of tasks -->
 235      <argument value="1000000"/>   <!-- Communication size of tasks -->
 236      <argument value="Jupiter"/>  <!-- First worker -->
 237      <argument value="Fafard"/>   <!-- Second worker -->
 238      <argument value="Ginette"/>  <!-- Third worker -->
 239      <argument value="Bourassa"/> <!-- Last worker -->
 240      <argument value="Tremblay"/> <!-- Me! I can work too! -->
 241   </process>
 242   <!-- The worker process (with no argument) -->
 243   <process host="Tremblay" function="worker" on_failure="RESTART"/>
 244   <process host="Jupiter" function="worker" on_failure="RESTART"/>
 245   <process host="Fafard" function="worker" on_failure="RESTART"/>
 246   <process host="Ginette" function="worker" on_failure="RESTART"/>
 247   <process host="Bourassa" function="worker" on_failure="RESTART"/>
 248 </platform>
 249 ~~~~
 250
 251 This is ok as the platform is rather small but will be painful when
 252 using larger platforms. Instead, modify the simulator
 253 `masterworker0.c` into `masterworker1.c` so that the master
 254 launches a worker process on all the other machines at startup. The
 255 new deployment file `deployment1.xml` should thus now simply be:
 256
 257 ~~~~{.xml}
 258 <?xml version='1.0'?>
 259 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
 260 <platform version="3">
 261   <!-- The master process (with some arguments) -->
 262   <process host="Tremblay" function="master">
 263      <argument value="20"/>       <!-- Number of tasks -->
 264      <argument value="50000000"/>  <!-- Computation size of tasks -->
 265      <argument value="1000000"/>   <!-- Communication size of tasks -->
 266   </process>
 267 </platform>
 268 ~~~~
 269
 270 To this end you may need the following MSG functions (click on the links
 271 to see their descriptions):
 272
 273 ~~~~{.c}
 274 int MSG_get_host_number(void);
 275 xbt_dynar_t MSG_hosts_as_dynar(void);
 276 void * xbt_dynar_to_array (xbt_dynar_t dynar);
 277 msg_process_t MSG_process_create(const char *name, xbt_main_func_t code,
 278                                  void *data, msg_host_t host);
 279 ~~~~
 280
 281 \note
 282     It may avoid bugs later to avoid launching a worker on
 283     the master host so you probably want to remove it from the host
 284     list.
 285
 286 The `data` field of the @ref MSG_process_create can be used to pass
 287 a channel name that will be private between master
 288 and workers (e.g., `master_name:worker_name`). Adding the
 289 `master_name` in the channel name will allow to easily have several
 290 masters and a worker per master on each machine. To this end, you
 291 may need to use the following functions:
 292
 293 ~~~~{.c}
 294 msg_host_t MSG_host_self(void);
 295 const char * MSG_host_get_name(msg_host_t host);
 296 msg_process_t MSG_process_self(void);
 297 void * MSG_process_get_data(msg_process_t process);
 298 ~~~~
 299
 300 If you are not too familiar with string
 301 manipulation in C, you may want to use the following functions
 302 (see the C reference for details):
 303
 304 ~~~~{.c}
 305 char *strcpy(char *dest, const char *src);
 306 char *strcat(char *dest, const char *src);
 307 ~~~~
 308
 309 ## Setting up a Time Limit Mechanism
 310
 311 In the current version, the number of tasks is defined through the
 312 worker arguments. Hence, tasks are created at the very beginning of
 313 the simulation. Instead, create tasks as needed and provide a time
 314 limit indicating when it stops sending tasks. To this end, you will
 315 obviously need to know what time it is:
 316
 317 ~~~~{.c}
 318 double MSG_get_clock(void);
 319 ~~~~
 320
 321 Otherwise, a quite effective way of terminating the simulation
 322 would be to use some of the following functions:
 323
 324 ~~~~{.c}
 325 void MSG_process_kill(msg_process_t process);
 326 int MSG_process_killall(int reset_PIDs);
 327 ~~~~
 328
 329 Anyway, the new deployment `deployment2.xml` file should thus look
 330 like this:
 331
 332 ~~~~{.xml}
 333 <?xml version='1.0'?>
 334 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
 335 <platform version="3">
 336   <process host="Tremblay" function="master">
 337      <argument value="3600"/>      <!-- Simulation timeout -->
 338      <argument value="50000000"/>  <!-- Computation size of tasks -->
 339      <argument value="1000000"/>   <!-- Communication size of tasks -->
 340   </process>
 341 </platform>
 342 ~~~~
 343
 344 It may also be a good idea to transform most of the `XBT_INFO` into
 345 `XBT_DEBUG` (e.g., keep the information on the total number of
 346 tasks processed). These debug messages can be activated as follows:
 347
 348 ~~~~{.sh}
 349 ./masterworker2 platforms/platform.xml deployment2.xml --log=msg_test.thres:debug
 350 ~~~~
 351
 352 ## Using the Tracing Mechanism
 353
 354 SimGrid can trace all resource consumption and the outcome can be
 355 displayed with viva as illustrated in the section "Setting up and Compiling". However, when several
 356 masters are deployed, it is hard to understand what happens.
 357
 358 ~~~~{.xml}
 359 <?xml version='1.0'?>
 360 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
 361 <platform version="3">
 362   <process host="Tremblay" function="master">
 363      <argument value="3600"/>      <!-- Simulation timeout -->
 364      <argument value="50000000"/>  <!-- Computation size of tasks -->
 365      <argument value="10"/>   <!-- Communication size of tasks -->
 366   </process>
 367   <process host="Fafard" function="master">
 368      <argument value="3600"/>      <!-- Simulation timeout -->
 369      <argument value="50000000"/>  <!-- Computation size of tasks -->
 370      <argument value="10"/>   <!-- Communication size of tasks -->
 371   </process>
 372   <process host="Jupiter" function="master">
 373      <argument value="3600"/>      <!-- Simulation timeout -->
 374      <argument value="50000000"/>  <!-- Computation size of tasks -->
 375      <argument value="10"/>   <!-- Communication size of tasks -->
 376   </process>
 377 </platform>
 378 ~~~~
 379
 380 So let's use categories to track more precisely who does what and when:
 381
 382 ~~~~{.c}
 383 void TRACE_category(const char *category);
 384 void MSG_task_set_category (msg_task_t task, const char *category);
 385 ~~~~
 386
 387 The outcome can then be visualized as follows:
 388
 389 ~~~~{.sh}
 390 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:yes\
 391     --cfg=tracing/categorized:yes --cfg=viva/categorized:viva_cat.plist
 392 LANG=C; viva simgrid.trace viva_cat.plist
 393 ~~~~
 394
 395 Right now, you should realize that nothing is behaving like you
 396 expect. Most workers are idle even though input data are ridiculous
 397 and there are several masters deployed on the platform. Using a
 398 Gantt-chart visualization may help:
 399
 400 ~~~~{.sh}
 401 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:yes \
 402     --cfg=tracing/msg/process:yes
 403 LANG=C; Paje simgrid.trace
 404 ~~~~
 405
 406 OK, so it should now be obvious that round robin is actually
 407 very bad.
 408
 409 ## Improving the Scheduling
 410
 411 Instead of a round-robin scheduling, let's implement a first-come
 412 first-served mechanism. To this end, workers need to send a tiny
 413 request first. A possible way to implement such a request with MSG
 414 is to send on a specific channel (e.g., the name of the master
 415 name) a task with payload 0 and whose attached data is the worker
 416 name. This way, the master can keep track of which workers are idle
 417 and willing to work.
 418
 419 To know whether it has pending requests, the master can use the
 420 following [function][fn:7]:
 421
 422 ~~~~{.c}
 423 int MSG_task_listen(const char *alias);
 424 ~~~~
 425
 426 If so, it should get the request and push the corresponding host
 427 into a dynar so that they can later be retrieved when sending a
 428 real [task][fn:7].
 429
 430 ~~~~{.c}
 431 xbt_dynar_t xbt_dynar_new(const unsigned long elm_size,
 432                           void_f_pvoid_t const free_f);
 433 void xbt_dynar_push(xbt_dynar_t const dynar, const void *src);
 434 void xbt_dynar_shift(xbt_dynar_t const dynar, void *const dst);
 435 unsigned long xbt_dynar_length(const xbt_dynar_t dynar);
 436 ~~~~
 437
 438 As you will soon realize, with such simple mechanisms, simple
 439 deadlocks will soon appear. They can easily be removed with a
 440 simple polling mechanism, hence the need for the following
 441 [function][fn:7]:
 442
 443 ~~~~{.c}
 444 msg_error_t MSG_process_sleep(double nb_sec);
 445 ~~~~
 446
 447 As you should quickly realize, on the simple previous example, it
 448 will double the throughput of the platform but will be quite
 449 ineffective when input size of the tasks is not negligible anymore.
 450
 451 From this, many things can easily be added. For example, you could:
 452 - add a performance measurement mechanism;
 453 - enable the master to make smart scheduling choices using
 454   measurement information;
 455 - allow workers to have several pending requests so as to overlap
 456   communication and computations as much as possible;
 457 - ...
 458
 459 ## Using More Elaborate Platforms
 460
 461 SimGrid offers a rather powerful platform modeling mechanism. The
 462 `src/platform/` repository comprises a variety of platform ranging
 463 from simple ones to quite elaborated ones. Associated to a good
 464 visualization tool to ensure your simulation is meaningful, they
 465 can allow you to study to which extent your algorithm scales...
 466
 467 What is the largest number of tasks requiring 50e6 flops and 1e5
 468 bytes that you manage to distribute and process in one hour on
 469 `g5k.xml` (you should use `deployment_general.xml`)?
 470
 471 # Points to improve for the next time
 472
 473 - Propose equivalent exercises and skeleton in java.
 474 - Propose a virtualbox image with everything (simgrid, paje, viva,
 475   ...) already set up.
 476 - Ease the installation on mac OS X (binary installer) and
 477   windows.
 478 - Explain that programming in C or java and having a working
 479   development environment is a prerequisite.
 480
 481 [fn:1]: http://triva.gforge.inria.fr/index.html
 482 [fn:2]: http://hal.inria.fr/inria-00529569
 483 [fn:3]: http://hal.inria.fr/hal-00738321
 484 [fn:4]: http://simgrid.gforge.inria.fr/documentation.html
 485 [fn:5]: http://paje.sourceforge.net/
 486 [fn:6]: http://vite.gforge.inria.fr/
 487
 488
 489
 490
 491
 492 */