doc/doxygen/tutorial.doc

   1 /*! @page tutorial SimGrid First Tutorial
   2
   3 SimGrid is a toolkit providing the core functionalities for the
   4 simulation of distributed applications in heterogeneous distributed
   5 environments.
   6
   7 The project goal is both to facilitate research and to help improving
   8 real applications in the area of distributed and parallel systems,
   9 ranging from simple network of workstations to Computational Grids to
  10 Clouds and to supercomputers.
  11
  12 \tableofcontents
  13
  14 \section  Scenario
  15 The goal of this practical session is to illustrate various usage of
  16 the MSG interface. To this end we will use the following simple setting:
  17
  18 > Assume we have a (possibly large) bunch of (possibly large) data to
  19 > process and which originally reside on a server (a.k.a. master). For
  20 > sake of simplicity, we assume all input file require the same amount
  21 > of computation. We assume the server can be helped by a (possibly
  22 > large) set of worker machines. What is the best way to organize the
  23 > computations ?
  24
  25 Although this looks like a very simple setting it raises several
  26 interesting questions:
  27
  28 - Which algorithm should the master use to send workload?
  29
  30     The most obvious algorithm would be to send tasks to workers in a
  31     round-robin fashion. This is the initial code we provide you.
  32
  33     A less obvious but probably more efficient approach would be to set up
  34     a request mechanism where a client first ask for tasks, which allows
  35     the server to decide which request to answer and possibly to send
  36     the tasks to the fastest machines. Maybe you can think of a
  37     smarter mechanism...
  38
  39 - How many tasks should the client ask for?
  40
  41     Indeed, if we set up a request mechanism so that workers only
  42     send request whenever they have no more task to process, they are
  43     likely to be poorly exploited since they will have to wait for the
  44     master to consider their request and for the input data to be
  45     transferred. A client should thus probably request a pool of tasks
  46     but if it requests too many tasks, it is likely to lead to a poor
  47     load-balancing...
  48
  49 - How is the quality of such algorithm dependent on the platform
  50     characteristics and on the task characteristics?
  51
  52     Whenever the input communication time is very small compared to
  53     processing time and workers are homogeneous, it is likely that the
  54     round-robin algorithm performs very well. Would it still hold true
  55     when transfer time is not negligible and the platform is, say,
  56     a volunteer computing system ?
  57
  58 - The network topology interconnecting the master and the workers
  59   may be quite complicated. How does such a topology impact the
  60   previous result?
  61
  62     When data transfers are the bottleneck, it is likely that a good
  63     modeling of the platform becomes essential. In this case, you may
  64     want to be able to account for complex platform topologies.
  65
  66 - Do the algorithms depend on a perfect knowledge of this
  67   topology?
  68
  69     Should we still use a flat master worker deployment or should we
  70     use a
  71
  72 - How is such an algorithm sensitive to external workload variation?
  73
  74     What if bandwidth, latency and power can vary with no warning?
  75     Shouldn't you study whether your algorithm is sensitive to such
  76     load variations?
  77
  78 - Although an algorithm may be more efficient than another, how
  79   does it interfere with other applications?
  80
  81     %As you can see, this very simple setting may need to evolve way
  82     beyond what you initially imagined.
  83
  84     <blockquote> Premature optimization is  the root of all evil. -- D.E.Knuth</blockquote>
  85
  86     Furthermore, writing your own simulator is much harder than you
  87     may imagine. This is why you should rely on an established and flexible
  88     one.
  89
  90 The following figure is a screenshot of [triva][fn:1] visualizing a [SimGrid
  91 simulation][fn:2] of two master worker applications (one in light gray and
  92 the other in dark gray) running in concurrence and showing resource
  93 usage over a long period of time.
  94
  95 ![Test](./sc3-description.png)
  96
  97 \section Prerequisites
  98
  99 Of course, you need to install SimGrid before taking this tutorial.
 100 Please refer to the relevant Section: \ref install.
 101
 102 ## Tutorials
 103
 104 A lot of information on how to install and use Simgrid are
 105 provided by the [online documentation][fn:4] and by several tutorials:
 106
 107 - http://simgrid.gforge.inria.fr/tutorials/simgrid-use-101.pdf
 108 - http://simgrid.gforge.inria.fr/tutorials/simgrid-tracing-101.pdf
 109 - http://simgrid.gforge.inria.fr/tutorials/simgrid-platf-101.pdf
 110
 111 \section intro_recommendation Recommended Steps
 112
 113 ## Installing Viva
 114
 115 This [software][fn:1] will be useful to make fancy graph or treemap
 116 visualizations and get a better understanding of simulations. You
 117 will first need to install pajeng:
 118
 119 ~~~~{.sh}
 120 sudo apt-get install git cmake build-essential libqt4-dev  libboost-dev freeglut3-dev ;
 121 git clone https://github.com/schnorr/pajeng.git
 122 cd pajeng && mkdir -p build &&  cd build && cmake ../ -DCMAKE_INSTALL_PREFIX=$HOME &&  make -j install
 123 cd ../../
 124 ~~~~
 125
 126 Then you can install viva.
 127
 128 ~~~~{.sh}
 129 sudo apt-get install libboost-dev libconfig++-dev libconfig8-dev libgtk2.0-dev freeglut3-dev
 130 git clone https://github.com/schnorr/viva.git
 131 cd viva && mkdir -p build_graph &&  cd build_graph && cmake ../ -DTUPI_LIBRARY=ON -DVIVA=ON -DCMAKE_INSTALL_PREFIX=$HOME &&  make -j install
 132 cd ../../
 133 ~~~~
 134
 135 ## Installing Paje
 136
 137 This [software][fn:5] provides a Gantt-chart visualization.
 138
 139 ~~~~{.sh}
 140 sudo apt-get install paje.app
 141 ~~~~
 142
 143 ## Installing Vite
 144
 145 This software provides a [Gantt-chart visualization][fn:6].
 146
 147 ~~~~{.sh}
 148 sudo apt-get install vite
 149 ~~~~
 150
 151 \section intro_start Let's get started
 152
 153 \anchor intro_setup
 154 ## Setting up and Compiling
 155
 156 The corresponding archive with all source files can be obtained
 157 [here](http://simgrid.gforge.inria.fr/tutorials/msg-tuto/msg-tuto.tgz),
 158 while the simgrid archive contains
 159 [several platform files](https://github.com/mquinson/simgrid/tree/master/examples/platforms)
 160 (click on the "Raw" button of files you want to download from GitHub).
 161
 162 ~~~~{.sh}
 163 tar zxf msg-tuto.tgz
 164 cd msg-tuto/src
 165 make
 166 ~~~~
 167
 168 %As you can see, there is already a nice Makefile that compiles
 169 everything for you. Now the tiny example has been compiled and it
 170 can be easily run as follows:
 171
 172 ~~~~{.sh}
 173 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1
 174 ~~~~
 175
 176 If you create a single self-content C-file named foo.c, the
 177 corresponding program will be simply compiled and linked with
 178 SimGrid by typing:
 179
 180 ~~~~{.sh}
 181 make foo
 182 ~~~~
 183
 184 For a more "fancy" output, you can try:
 185
 186 ~~~~{.sh}
 187 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1 | simgrid-colorizer
 188 ~~~~
 189
 190 For a really fancy output, you should use [viva/triva][fn:1]:
 191
 192 ~~~~{.sh}
 193 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
 194     --cfg=tracing/uncategorized:yes --cfg=viva/uncategorized:uncat.plist
 195 LANG=C ; viva simgrid.trace uncat.plist
 196 ~~~~
 197
 198 For a more classical Gantt-Chart visualization, you can produce a
 199 [Paje][fn:5] trace:
 200
 201 ~~~~{.sh}
 202 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
 203     --cfg=tracing/msg/process:yes
 204 LANG=C ; Paje simgrid.trace
 205 ~~~~
 206
 207 Alternatively, you can use [vite][fn:6].
 208
 209 ~~~~{.sh}
 210 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
 211     --cfg=tracing/msg/process:yes --cfg=tracing/basic:yes
 212 vite simgrid.trace
 213 ~~~~
 214
 215 ## Getting Rid of Workers in the Deployment File
 216
 217 In the previous example, the deployment file `deployment0.xml`
 218 is tightly connected to the platform file `platform.xml` and a
 219 worker process is launched on each host:
 220
 221 ~~~~{.xml}
 222 <?xml version='1.0'?>
 223 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
 224 <platform version="3">
 225   <!-- The master process (with some arguments) -->
 226   <process host="Tremblay" function="master">
 227      <argument value="20"/>       <!-- Number of tasks -->
 228      <argument value="50000000"/>  <!-- Computation size of tasks -->
 229      <argument value="1000000"/>   <!-- Communication size of tasks -->
 230      <argument value="Jupiter"/>  <!-- First worker -->
 231      <argument value="Fafard"/>   <!-- Second worker -->
 232      <argument value="Ginette"/>  <!-- Third worker -->
 233      <argument value="Bourassa"/> <!-- Last worker -->
 234      <argument value="Tremblay"/> <!-- Me! I can work too! -->
 235   </process>
 236   <!-- The worker process (with no argument) -->
 237   <process host="Tremblay" function="worker" on_failure="RESTART"/>
 238   <process host="Jupiter" function="worker" on_failure="RESTART"/>
 239   <process host="Fafard" function="worker" on_failure="RESTART"/>
 240   <process host="Ginette" function="worker" on_failure="RESTART"/>
 241   <process host="Bourassa" function="worker" on_failure="RESTART"/>
 242 </platform>
 243 ~~~~
 244
 245 This is ok as the platform is rather small but will be painful when
 246 using larger platforms. Instead, modify the simulator
 247 `masterworker0.c` into `masterworker1.c` so that the master
 248 launches a worker process on all the other machines at startup. The
 249 new deployment file `deployment1.xml` should thus now simply be:
 250
 251 ~~~~{.xml}
 252 <?xml version='1.0'?>
 253 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
 254 <platform version="3">
 255   <!-- The master process (with some arguments) -->
 256   <process host="Tremblay" function="master">
 257      <argument value="20"/>       <!-- Number of tasks -->
 258      <argument value="50000000"/>  <!-- Computation size of tasks -->
 259      <argument value="1000000"/>   <!-- Communication size of tasks -->
 260   </process>
 261 </platform>
 262 ~~~~
 263
 264 To this end you may need the following MSG functions (click on the links
 265 to see their descriptions):
 266
 267 ~~~~{.c}
 268 int MSG_get_host_number(void);
 269 xbt_dynar_t MSG_hosts_as_dynar(void);
 270 void * xbt_dynar_to_array (xbt_dynar_t dynar);
 271 msg_process_t MSG_process_create(const char *name, xbt_main_func_t code,
 272                                  void *data, msg_host_t host);
 273 ~~~~
 274
 275 \note
 276     It may avoid bugs later to avoid launching a worker on
 277     the master host so you probably want to remove it from the host
 278     list.
 279
 280 The `data` field of the @ref MSG_process_create can be used to pass
 281 a channel name that will be private between master
 282 and workers (e.g., `master_name:worker_name`). Adding the
 283 `master_name` in the channel name will allow to easily have several
 284 masters and a worker per master on each machine. To this end, you
 285 may need to use the following functions:
 286
 287 ~~~~{.c}
 288 msg_host_t MSG_host_self(void);
 289 const char * MSG_host_get_name(msg_host_t host);
 290 msg_process_t MSG_process_self(void);
 291 void * MSG_process_get_data(msg_process_t process);
 292 ~~~~
 293
 294 If you are not too familiar with string
 295 manipulation in C, you may want to use the following functions
 296 (see the C reference for details):
 297
 298 ~~~~{.c}
 299 char *strcpy(char *dest, const char *src);
 300 char *strcat(char *dest, const char *src);
 301 ~~~~
 302
 303 ## Setting up a Time Limit Mechanism
 304
 305 In the current version, the number of tasks is defined through the
 306 worker arguments. Hence, tasks are created at the very beginning of
 307 the simulation. Instead, create tasks as needed and provide a time
 308 limit indicating when it stops sending tasks. To this end, you will
 309 obviously need to know what time it is:
 310
 311 ~~~~{.c}
 312 double MSG_get_clock(void);
 313 ~~~~
 314
 315 Otherwise, a quite effective way of terminating the simulation
 316 would be to use some of the following functions:
 317
 318 ~~~~{.c}
 319 void MSG_process_kill(msg_process_t process);
 320 int MSG_process_killall(int reset_PIDs);
 321 ~~~~
 322
 323 Anyway, the new deployment `deployment2.xml` file should thus look
 324 like this:
 325
 326 ~~~~{.xml}
 327 <?xml version='1.0'?>
 328 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
 329 <platform version="3">
 330   <process host="Tremblay" function="master">
 331      <argument value="3600"/>      <!-- Simulation timeout -->
 332      <argument value="50000000"/>  <!-- Computation size of tasks -->
 333      <argument value="1000000"/>   <!-- Communication size of tasks -->
 334   </process>
 335 </platform>
 336 ~~~~
 337
 338 It may also be a good idea to transform most of the `XBT_INFO` into
 339 `XBT_DEBUG` (e.g., keep the information on the total number of
 340 tasks processed). These debug messages can be activated as follows:
 341
 342 ~~~~{.sh}
 343 ./masterworker2 platforms/platform.xml deployment2.xml --log=msg_test.thres:debug
 344 ~~~~
 345
 346 ## Using the Tracing Mechanism
 347
 348 SimGrid can trace all resource consumption and the outcome can be
 349 displayed with viva as illustrated in the section \ref intro_setup. However, when several
 350 masters are deployed, it is hard to understand what happens.
 351
 352 ~~~~{.xml}
 353 <?xml version='1.0'?>
 354 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
 355 <platform version="3">
 356   <process host="Tremblay" function="master">
 357      <argument value="3600"/>      <!-- Simulation timeout -->
 358      <argument value="50000000"/>  <!-- Computation size of tasks -->
 359      <argument value="10"/>   <!-- Communication size of tasks -->
 360   </process>
 361   <process host="Fafard" function="master">
 362      <argument value="3600"/>      <!-- Simulation timeout -->
 363      <argument value="50000000"/>  <!-- Computation size of tasks -->
 364      <argument value="10"/>   <!-- Communication size of tasks -->
 365   </process>
 366   <process host="Jupiter" function="master">
 367      <argument value="3600"/>      <!-- Simulation timeout -->
 368      <argument value="50000000"/>  <!-- Computation size of tasks -->
 369      <argument value="10"/>   <!-- Communication size of tasks -->
 370   </process>
 371 </platform>
 372 ~~~~
 373
 374 So let's use categories to track more precisely who does what and when:
 375
 376 ~~~~{.c}
 377 void TRACE_category(const char *category);
 378 void MSG_task_set_category (msg_task_t task, const char *category);
 379 ~~~~
 380
 381 The outcome can then be visualized as follows:
 382
 383 ~~~~{.sh}
 384 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:yes\
 385     --cfg=tracing/categorized:yes --cfg=viva/categorized:viva_cat.plist
 386 LANG=C; viva simgrid.trace viva_cat.plist
 387 ~~~~
 388
 389 Right now, you should realize that nothing is behaving like you
 390 expect. Most workers are idle even though input data are ridiculous
 391 and there are several masters deployed on the platform. Using a
 392 Gantt-chart visualization may help:
 393
 394 ~~~~{.sh}
 395 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:yes \
 396     --cfg=tracing/msg/process:yes
 397 LANG=C; Paje simgrid.trace
 398 ~~~~
 399
 400 OK, so it should now be obvious that round robin is actually
 401 very bad.
 402
 403 ## Improving the Scheduling
 404
 405 Instead of a round-robin scheduling, let's implement a first-come
 406 first-served mechanism. To this end, workers need to send a tiny
 407 request first. A possible way to implement such a request with MSG
 408 is to send on a specific channel (e.g., the name of the master
 409 name) a task with payload 0 and whose attached data is the worker
 410 name. This way, the master can keep track of which workers are idle
 411 and willing to work.
 412
 413 To know whether it has pending requests, the master can use the
 414 following [function][fn:7]:
 415
 416 ~~~~{.c}
 417 int MSG_task_listen(const char *alias);
 418 ~~~~
 419
 420 If so, it should get the request and push the corresponding host
 421 into a dynar so that they can later be retrieved when sending a
 422 real [task][fn:7].
 423
 424 ~~~~{.c}
 425 xbt_dynar_t xbt_dynar_new(const unsigned long elm_size,
 426                           void_f_pvoid_t const free_f);
 427 void xbt_dynar_push(xbt_dynar_t const dynar, const void *src);
 428 void xbt_dynar_shift(xbt_dynar_t const dynar, void *const dst);
 429 unsigned long xbt_dynar_length(const xbt_dynar_t dynar);
 430 ~~~~
 431
 432 %As you will soon realize, with such simple mechanisms, simple
 433 deadlocks will soon appear. They can easily be removed with a
 434 simple polling mechanism, hence the need for the following
 435 [function][fn:7]:
 436
 437 ~~~~{.c}
 438 msg_error_t MSG_process_sleep(double nb_sec);
 439 ~~~~
 440
 441 %As you should quickly realize, on the simple previous example, it
 442 will double the throughput of the platform but will be quite
 443 ineffective when input size of the tasks is not negligible anymore.
 444
 445 From this, many things can easily be added. For example, you could:
 446 - add a performance measurement mechanism;
 447 - enable the master to make smart scheduling choices using
 448   measurement information;
 449 - allow workers to have several pending requests so as to overlap
 450   communication and computations as much as possible;
 451 - ...
 452
 453 ## Using More Elaborate Platforms
 454
 455 SimGrid offers a rather powerful platform modeling mechanism. The
 456 `src/examples/platforms/` repository comprises a variety of platforms ranging
 457 from simple to elaborate. Associated to a good
 458 visualization tool to ensure your simulation is meaningful, they
 459 can allow you to study to which extent your algorithm scales...
 460
 461 What is the largest number of tasks requiring 50e6 flops and 1e5
 462 bytes that you manage to distribute and process in one hour on
 463 `g5k.xml` (you should use `deployment_general.xml`)?
 464
 465 \section intro_todo TODO: Points to improve for the next time
 466
 467 - Propose equivalent exercises and skeleton in java.
 468 - Propose a virtualbox image with everything (simgrid, paje, viva,
 469   ...) already set up.
 470 - Ease the installation on mac OS X (binary installer) and
 471   windows.
 472 - Explain that programming in C or java and having a working
 473   development environment is a prerequisite.
 474
 475 [fn:1]: http://triva.gforge.inria.fr/index.html
 476 [fn:2]: http://hal.inria.fr/inria-00529569
 477 [fn:3]: http://hal.inria.fr/hal-00738321
 478 [fn:4]: http://simgrid.gforge.inria.fr/documentation.html
 479 [fn:5]: http://paje.sourceforge.net/
 480 [fn:6]: http://vite.gforge.inria.fr/
 481
 482
 483
 484
 485
 486 */