doc/doxygen/introduction.doc

   1 /*! @page introduction Introduction to SimGrid
   2
   3
   4 [SimGrid](http://simgrid.gforge.inria.fr/) is a toolkit
   5 that provides core functionalities for the simulation of distributed
   6 applications in heterogeneous distributed environments.
   7
   8 The specific goal of the project is to facilitate research in the area of
   9 distributed and parallel application scheduling on distributed computing
  10 platforms ranging from simple network of workstations to Computational
  11 Grids.
  12
  13 \tableofcontents
  14
  15 \section  Scenario
  16 The goal of this practical session is to illustrate various usage of
  17 the MSG interface. To this end we will use the following simple setting:
  18
  19 > Assume we have a (possibly large) bunch of (possibly large) data to
  20 > process and which originally reside on a server (a.k.a. master). For
  21 > sake of simplicity, we assume all input file require the same amount
  22 > of computation. We assume the server can be helped by a (possibly
  23 > large) set of worker machines. What is the best way to organize the
  24 > computations ?
  25
  26 Although this looks like a very simple setting it raises several
  27 interesting questions:
  28
  29 - Which algorithm should the master use to send workload?
  30
  31     The most obvious algorithm would be to send tasks to workers in a
  32     round-robin fashion. This is the initial code we provide you.
  33
  34     A less obvious but probably more efficient approach would be to set up
  35     a request mechanism where a client first ask for tasks, which allows
  36     the server to decide which request to answer and possibly to send
  37     the tasks to the fastest machines. Maybe you can think of a
  38     smarter mechanism...
  39
  40 - How many tasks should the client ask for?
  41
  42     Indeed, if we set up a request mechanism so that workers only
  43     send request whenever they have no more task to process, they are
  44     likely to be poorly exploited since they will have to wait for the
  45     master to consider their request and for the input data to be
  46     transferred. A client should thus probably request a pool of tasks
  47     but if it requests too many tasks, it is likely to lead to a poor
  48     load-balancing...
  49
  50 - How is the quality of such algorithm dependent on the platform
  51     characteristics and on the task characteristics?
  52
  53     Whenever the input communication time is very small compared to
  54     processing time and workers are homogeneous, it is likely that the
  55     round-robin algorithm performs very well. Would it still hold true
  56     when transfer time is not negligible and the platform is, say,
  57     a volunteer computing system ?
  58
  59 - The network topology interconnecting the master and the workers
  60   may be quite complicated. How does such a topology impact the
  61   previous result?
  62
  63     When data transfers are the bottleneck, it is likely that a good
  64     modeling of the platform becomes essential. In this case, you may
  65     want to be able to account for complex platform topologies.
  66
  67 - Do the algorithms depend on a perfect knowledge of this
  68   topology?
  69
  70     Should we still use a flat master worker deployment or should we
  71     use a
  72
  73 - How is such an algorithm sensitive to external workload variation?
  74
  75     What if bandwidth, latency and power can vary with no warning?
  76     Shouldn't you study whether your algorithm is sensitive to such
  77     load variations?
  78
  79 - Although an algorithm may be more efficient than another, how
  80   does it interfere with other applications?
  81
  82     %As you can see, this very simple setting may need to evolve way
  83     beyond what you initially imagined.
  84
  85     <blockquote> Premature optimization is  the root of all evil. -- D.E.Knuth</blockquote>
  86
  87     Furthermore, writing your own simulator is much harder than you
  88     may imagine. This is why you should rely on an established and flexible
  89     one.
  90
  91 The following figure is a screenshot of [triva][fn:1] visualizing a [SimGrid
  92 simulation][fn:2] of two master worker applications (one in light gray and
  93 the other in dark gray) running in concurrence and showing resource
  94 usage over a long period of time.
  95
  96 ![Test](./sc3-description.png)
  97
  98 \section Prerequisites
  99
 100 Of course, you need to install SimGrid before taking this tutorial.
 101 Please refer to the relevant Section: \ref install.
 102
 103 ## Tutorials
 104
 105 A lot of information on how to install and use Simgrid are
 106 provided by the [online documentation][fn:4] and by several tutorials:
 107
 108 - http://simgrid.gforge.inria.fr/tutorials/simgrid-use-101.pdf
 109 - http://simgrid.gforge.inria.fr/tutorials/simgrid-tracing-101.pdf
 110 - http://simgrid.gforge.inria.fr/tutorials/simgrid-platf-101.pdf
 111
 112 \section intro_recommendation Recommended Steps
 113
 114 ## Installing Viva
 115
 116 This [software][fn:1] will be useful to make fancy graph or treemap
 117 visualizations and get a better understanding of simulations. You
 118 will first need to install pajeng:
 119
 120 ~~~~{.sh}
 121 sudo apt-get install git cmake build-essential libqt4-dev  libboost-dev freeglut3-dev ;
 122 git clone https://github.com/schnorr/pajeng.git
 123 cd pajeng && mkdir -p build &&  cd build && cmake ../ -DCMAKE_INSTALL_PREFIX=$HOME &&  make -j install
 124 cd ../../
 125 ~~~~
 126
 127 Then you can install viva.
 128
 129 ~~~~{.sh}
 130 sudo apt-get install libboost-dev libconfig++-dev libconfig8-dev libgtk2.0-dev freeglut3-dev
 131 git clone https://github.com/schnorr/viva.git
 132 cd viva && mkdir -p build_graph &&  cd build_graph && cmake ../ -DTUPI_LIBRARY=ON -DVIVA=ON -DCMAKE_INSTALL_PREFIX=$HOME &&  make -j install
 133 cd ../../
 134 ~~~~
 135
 136 ## Installing Paje
 137
 138 This [software][fn:5] provides a Gantt-chart visualization.
 139
 140 ~~~~{.sh}
 141 sudo apt-get install paje.app
 142 ~~~~
 143
 144 ## Installing Vite
 145
 146 This software provides a [Gantt-chart visualization][fn:6].
 147
 148 ~~~~{.sh}
 149 sudo apt-get install vite
 150 ~~~~
 151
 152 \section intro_start Let's get started
 153
 154 \anchor intro_setup
 155 ## Setting up and Compiling
 156
 157 The corresponding archive with all source files can be obtained
 158 [here](http://simgrid.gforge.inria.fr/tutorials/msg-tuto/msg-tuto.tgz),
 159 while the simgrid archive contains
 160 [several platform files](https://github.com/mquinson/simgrid/tree/master/examples/platforms)
 161 (click on the "Raw" button of files you want to download from GitHub).
 162
 163 ~~~~{.sh}
 164 tar zxf msg-tuto.tgz
 165 cd msg-tuto/src
 166 make
 167 ~~~~
 168
 169 %As you can see, there is already a nice Makefile that compiles
 170 everything for you. Now the tiny example has been compiled and it
 171 can be easily run as follows:
 172
 173 ~~~~{.sh}
 174 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1
 175 ~~~~
 176
 177 If you create a single self-content C-file named foo.c, the
 178 corresponding program will be simply compiled and linked with
 179 SimGrid by typing:
 180
 181 ~~~~{.sh}
 182 make foo
 183 ~~~~
 184
 185 For a more "fancy" output, you can try:
 186
 187 ~~~~{.sh}
 188 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1 | simgrid-colorizer
 189 ~~~~
 190
 191 For a really fancy output, you should use [viva/triva][fn:1]:
 192
 193 ~~~~{.sh}
 194 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
 195     --cfg=tracing/uncategorized:yes --cfg=viva/uncategorized:uncat.plist
 196 LANG=C ; viva simgrid.trace uncat.plist
 197 ~~~~
 198
 199 For a more classical Gantt-Chart visualization, you can produce a
 200 [Paje][fn:5] trace:
 201
 202 ~~~~{.sh}
 203 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
 204     --cfg=tracing/msg/process:yes
 205 LANG=C ; Paje simgrid.trace
 206 ~~~~
 207
 208 Alternatively, you can use [vite][fn:6].
 209
 210 ~~~~{.sh}
 211 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
 212     --cfg=tracing/msg/process:yes --cfg=tracing/basic:yes
 213 vite simgrid.trace
 214 ~~~~
 215
 216 ## Getting Rid of Workers in the Deployment File
 217
 218 In the previous example, the deployment file `deployment0.xml`
 219 is tightly connected to the platform file `platform.xml` and a
 220 worker process is launched on each host:
 221
 222 ~~~~{.xml}
 223 <?xml version='1.0'?>
 224 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
 225 <platform version="3">
 226   <!-- The master process (with some arguments) -->
 227   <process host="Tremblay" function="master">
 228      <argument value="20"/>       <!-- Number of tasks -->
 229      <argument value="50000000"/>  <!-- Computation size of tasks -->
 230      <argument value="1000000"/>   <!-- Communication size of tasks -->
 231      <argument value="Jupiter"/>  <!-- First worker -->
 232      <argument value="Fafard"/>   <!-- Second worker -->
 233      <argument value="Ginette"/>  <!-- Third worker -->
 234      <argument value="Bourassa"/> <!-- Last worker -->
 235      <argument value="Tremblay"/> <!-- Me! I can work too! -->
 236   </process>
 237   <!-- The worker process (with no argument) -->
 238   <process host="Tremblay" function="worker" on_failure="RESTART"/>
 239   <process host="Jupiter" function="worker" on_failure="RESTART"/>
 240   <process host="Fafard" function="worker" on_failure="RESTART"/>
 241   <process host="Ginette" function="worker" on_failure="RESTART"/>
 242   <process host="Bourassa" function="worker" on_failure="RESTART"/>
 243 </platform>
 244 ~~~~
 245
 246 This is ok as the platform is rather small but will be painful when
 247 using larger platforms. Instead, modify the simulator
 248 `masterworker0.c` into `masterworker1.c` so that the master
 249 launches a worker process on all the other machines at startup. The
 250 new deployment file `deployment1.xml` should thus now simply be:
 251
 252 ~~~~{.xml}
 253 <?xml version='1.0'?>
 254 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
 255 <platform version="3">
 256   <!-- The master process (with some arguments) -->
 257   <process host="Tremblay" function="master">
 258      <argument value="20"/>       <!-- Number of tasks -->
 259      <argument value="50000000"/>  <!-- Computation size of tasks -->
 260      <argument value="1000000"/>   <!-- Communication size of tasks -->
 261   </process>
 262 </platform>
 263 ~~~~
 264
 265 To this end you may need the following MSG functions (click on the links
 266 to see their descriptions):
 267
 268 ~~~~{.c}
 269 int MSG_get_host_number(void);
 270 xbt_dynar_t MSG_hosts_as_dynar(void);
 271 void * xbt_dynar_to_array (xbt_dynar_t dynar);
 272 msg_process_t MSG_process_create(const char *name, xbt_main_func_t code,
 273                                  void *data, msg_host_t host);
 274 ~~~~
 275
 276 \note
 277     It may avoid bugs later to avoid launching a worker on
 278     the master host so you probably want to remove it from the host
 279     list.
 280
 281 The `data` field of the @ref MSG_process_create can be used to pass
 282 a channel name that will be private between master
 283 and workers (e.g., `master_name:worker_name`). Adding the
 284 `master_name` in the channel name will allow to easily have several
 285 masters and a worker per master on each machine. To this end, you
 286 may need to use the following functions:
 287
 288 ~~~~{.c}
 289 msg_host_t MSG_host_self(void);
 290 const char * MSG_host_get_name(msg_host_t host);
 291 msg_process_t MSG_process_self(void);
 292 void * MSG_process_get_data(msg_process_t process);
 293 ~~~~
 294
 295 If you are not too familiar with string
 296 manipulation in C, you may want to use the following functions
 297 (see the C reference for details):
 298
 299 ~~~~{.c}
 300 char *strcpy(char *dest, const char *src);
 301 char *strcat(char *dest, const char *src);
 302 ~~~~
 303
 304 ## Setting up a Time Limit Mechanism
 305
 306 In the current version, the number of tasks is defined through the
 307 worker arguments. Hence, tasks are created at the very beginning of
 308 the simulation. Instead, create tasks as needed and provide a time
 309 limit indicating when it stops sending tasks. To this end, you will
 310 obviously need to know what time it is:
 311
 312 ~~~~{.c}
 313 double MSG_get_clock(void);
 314 ~~~~
 315
 316 Otherwise, a quite effective way of terminating the simulation
 317 would be to use some of the following functions:
 318
 319 ~~~~{.c}
 320 void MSG_process_kill(msg_process_t process);
 321 int MSG_process_killall(int reset_PIDs);
 322 ~~~~
 323
 324 Anyway, the new deployment `deployment2.xml` file should thus look
 325 like this:
 326
 327 ~~~~{.xml}
 328 <?xml version='1.0'?>
 329 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
 330 <platform version="3">
 331   <process host="Tremblay" function="master">
 332      <argument value="3600"/>      <!-- Simulation timeout -->
 333      <argument value="50000000"/>  <!-- Computation size of tasks -->
 334      <argument value="1000000"/>   <!-- Communication size of tasks -->
 335   </process>
 336 </platform>
 337 ~~~~
 338
 339 It may also be a good idea to transform most of the `XBT_INFO` into
 340 `XBT_DEBUG` (e.g., keep the information on the total number of
 341 tasks processed). These debug messages can be activated as follows:
 342
 343 ~~~~{.sh}
 344 ./masterworker2 platforms/platform.xml deployment2.xml --log=msg_test.thres:debug
 345 ~~~~
 346
 347 ## Using the Tracing Mechanism
 348
 349 SimGrid can trace all resource consumption and the outcome can be
 350 displayed with viva as illustrated in the section \ref intro_setup. However, when several
 351 masters are deployed, it is hard to understand what happens.
 352
 353 ~~~~{.xml}
 354 <?xml version='1.0'?>
 355 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
 356 <platform version="3">
 357   <process host="Tremblay" function="master">
 358      <argument value="3600"/>      <!-- Simulation timeout -->
 359      <argument value="50000000"/>  <!-- Computation size of tasks -->
 360      <argument value="10"/>   <!-- Communication size of tasks -->
 361   </process>
 362   <process host="Fafard" function="master">
 363      <argument value="3600"/>      <!-- Simulation timeout -->
 364      <argument value="50000000"/>  <!-- Computation size of tasks -->
 365      <argument value="10"/>   <!-- Communication size of tasks -->
 366   </process>
 367   <process host="Jupiter" function="master">
 368      <argument value="3600"/>      <!-- Simulation timeout -->
 369      <argument value="50000000"/>  <!-- Computation size of tasks -->
 370      <argument value="10"/>   <!-- Communication size of tasks -->
 371   </process>
 372 </platform>
 373 ~~~~
 374
 375 So let's use categories to track more precisely who does what and when:
 376
 377 ~~~~{.c}
 378 void TRACE_category(const char *category);
 379 void MSG_task_set_category (msg_task_t task, const char *category);
 380 ~~~~
 381
 382 The outcome can then be visualized as follows:
 383
 384 ~~~~{.sh}
 385 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:yes\
 386     --cfg=tracing/categorized:yes --cfg=viva/categorized:viva_cat.plist
 387 LANG=C; viva simgrid.trace viva_cat.plist
 388 ~~~~
 389
 390 Right now, you should realize that nothing is behaving like you
 391 expect. Most workers are idle even though input data are ridiculous
 392 and there are several masters deployed on the platform. Using a
 393 Gantt-chart visualization may help:
 394
 395 ~~~~{.sh}
 396 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:yes \
 397     --cfg=tracing/msg/process:yes
 398 LANG=C; Paje simgrid.trace
 399 ~~~~
 400
 401 OK, so it should now be obvious that round robin is actually
 402 very bad.
 403
 404 ## Improving the Scheduling
 405
 406 Instead of a round-robin scheduling, let's implement a first-come
 407 first-served mechanism. To this end, workers need to send a tiny
 408 request first. A possible way to implement such a request with MSG
 409 is to send on a specific channel (e.g., the name of the master
 410 name) a task with payload 0 and whose attached data is the worker
 411 name. This way, the master can keep track of which workers are idle
 412 and willing to work.
 413
 414 To know whether it has pending requests, the master can use the
 415 following [function][fn:7]:
 416
 417 ~~~~{.c}
 418 int MSG_task_listen(const char *alias);
 419 ~~~~
 420
 421 If so, it should get the request and push the corresponding host
 422 into a dynar so that they can later be retrieved when sending a
 423 real [task][fn:7].
 424
 425 ~~~~{.c}
 426 xbt_dynar_t xbt_dynar_new(const unsigned long elm_size,
 427                           void_f_pvoid_t const free_f);
 428 void xbt_dynar_push(xbt_dynar_t const dynar, const void *src);
 429 void xbt_dynar_shift(xbt_dynar_t const dynar, void *const dst);
 430 unsigned long xbt_dynar_length(const xbt_dynar_t dynar);
 431 ~~~~
 432
 433 %As you will soon realize, with such simple mechanisms, simple
 434 deadlocks will soon appear. They can easily be removed with a
 435 simple polling mechanism, hence the need for the following
 436 [function][fn:7]:
 437
 438 ~~~~{.c}
 439 msg_error_t MSG_process_sleep(double nb_sec);
 440 ~~~~
 441
 442 %As you should quickly realize, on the simple previous example, it
 443 will double the throughput of the platform but will be quite
 444 ineffective when input size of the tasks is not negligible anymore.
 445
 446 From this, many things can easily be added. For example, you could:
 447 - add a performance measurement mechanism;
 448 - enable the master to make smart scheduling choices using
 449   measurement information;
 450 - allow workers to have several pending requests so as to overlap
 451   communication and computations as much as possible;
 452 - ...
 453
 454 ## Using More Elaborate Platforms
 455
 456 SimGrid offers a rather powerful platform modeling mechanism. The
 457 `src/examples/platforms/` repository comprises a variety of platforms ranging
 458 from simple to elaborate. Associated to a good
 459 visualization tool to ensure your simulation is meaningful, they
 460 can allow you to study to which extent your algorithm scales...
 461
 462 What is the largest number of tasks requiring 50e6 flops and 1e5
 463 bytes that you manage to distribute and process in one hour on
 464 `g5k.xml` (you should use `deployment_general.xml`)?
 465
 466 \section intro_todo TODO: Points to improve for the next time
 467
 468 - Propose equivalent exercises and skeleton in java.
 469 - Propose a virtualbox image with everything (simgrid, paje, viva,
 470   ...) already set up.
 471 - Ease the installation on mac OS X (binary installer) and
 472   windows.
 473 - Explain that programming in C or java and having a working
 474   development environment is a prerequisite.
 475
 476 [fn:1]: http://triva.gforge.inria.fr/index.html
 477 [fn:2]: http://hal.inria.fr/inria-00529569
 478 [fn:3]: http://hal.inria.fr/hal-00738321
 479 [fn:4]: http://simgrid.gforge.inria.fr/documentation.html
 480 [fn:5]: http://paje.sourceforge.net/
 481 [fn:6]: http://vite.gforge.inria.fr/
 482
 483
 484
 485
 486
 487 */