1 /*! @page introduction Introduction to SimGrid
4 [SimGrid](http://simgrid.gforge.inria.fr/) is a toolkit
5 that provides core functionalities for the simulation of distributed
6 applications in heterogeneous distributed environments.
8 The specific goal of the project is to facilitate research in the area of
9 distributed and parallel application scheduling on distributed computing
10 platforms ranging from simple network of workstations to Computational
16 The goal of this practical session is to illustrate various usage of
17 the MSG interface. To this end we will use the following simple setting:
19 > Assume we have a (possibly large) bunch of (possibly large) data to
20 > process and which originally reside on a server (a.k.a. master). For
21 > sake of simplicity, we assume all input file require the same amount
22 > of computation. We assume the server can be helped by a (possibly
23 > large) set of worker machines. What is the best way to organize the
26 Although this looks like a very simple setting it raises several
27 interesting questions:
29 - Which algorithm should the master use to send workload?
31 The most obvious algorithm would be to send tasks to workers in a
32 round-robin fashion. This is the initial code we provide you.
34 A less obvious but probably more efficient approach would be to set up
35 a request mechanism where a client first ask for tasks, which allows
36 the server to decide which request to answer and possibly to send
37 the tasks to the fastest machines. Maybe you can think of a
40 - How many tasks should the client ask for?
42 Indeed, if we set up a request mechanism so that workers only
43 send request whenever they have no more task to process, they are
44 likely to be poorly exploited since they will have to wait for the
45 master to consider their request and for the input data to be
46 transferred. A client should thus probably request a pool of tasks
47 but if it requests too many tasks, it is likely to lead to a poor
50 - How is the quality of such algorithm dependent on the platform
51 characteristics and on the task characteristics?
53 Whenever the input communication time is very small compared to
54 processing time and workers are homogeneous, it is likely that the
55 round-robin algorithm performs very well. Would it still hold true
56 when transfer time is not negligible and the platform is, say,
57 a volunteer computing system ?
59 - The network topology interconnecting the master and the workers
60 may be quite complicated. How does such a topology impact the
63 When data transfers are the bottleneck, it is likely that a good
64 modeling of the platform becomes essential. In this case, you may
65 want to be able to account for complex platform topologies.
67 - Do the algorithms depend on a perfect knowledge of this
70 Should we still use a flat master worker deployment or should we
73 - How is such an algorithm sensitive to external workload variation?
75 What if bandwidth, latency and power can vary with no warning?
76 Shouldn't you study whether your algorithm is sensitive to such
79 - Although an algorithm may be more efficient than another, how
80 does it interfere with other applications?
82 %As you can see, this very simple setting may need to evolve way
83 beyond what you initially imagined.
85 <blockquote> Premature optimization is the root of all evil. -- D.E.Knuth</blockquote>
87 Furthermore, writing your own simulator is much harder than you
88 may imagine. This is why you should rely on an established and flexible
91 The following figure is a screenshot of [triva][fn:1] visualizing a [SimGrid
92 simulation][fn:2] of two master worker applications (one in light gray and
93 the other in dark gray) running in concurrence and showing resource
94 usage over a long period of time.
96 ![Test](./sc3-description.png)
98 \section Prerequisites
100 Of course, you need to install SimGrid before taking this tutorial.
101 Please refer to the relevant Section: \ref install.
105 A lot of information on how to install and use Simgrid are
106 provided by the [online documentation][fn:4] and by several tutorials:
108 - http://simgrid.gforge.inria.fr/tutorials/simgrid-use-101.pdf
109 - http://simgrid.gforge.inria.fr/tutorials/simgrid-tracing-101.pdf
110 - http://simgrid.gforge.inria.fr/tutorials/simgrid-platf-101.pdf
112 \section intro_recommendation Recommended Steps
116 This [software][fn:1] will be useful to make fancy graph or treemap
117 visualizations and get a better understanding of simulations. You
118 will first need to install pajeng:
121 sudo apt-get install git cmake build-essential libqt4-dev libboost-dev freeglut3-dev ;
122 git clone https://github.com/schnorr/pajeng.git
123 cd pajeng && mkdir -p build && cd build && cmake ../ -DCMAKE_INSTALL_PREFIX=$HOME && make -j install
127 Then you can install viva.
130 sudo apt-get install libboost-dev libconfig++-dev libconfig8-dev libgtk2.0-dev freeglut3-dev
131 git clone https://github.com/schnorr/viva.git
132 cd viva && mkdir -p build_graph && cd build_graph && cmake ../ -DTUPI_LIBRARY=ON -DVIVA=ON -DCMAKE_INSTALL_PREFIX=$HOME && make -j install
138 This [software][fn:5] provides a Gantt-chart visualization.
141 sudo apt-get install paje.app
146 This software provides a [Gantt-chart visualization][fn:6].
149 sudo apt-get install vite
152 \section intro_start Let's get started
155 ## Setting up and Compiling
157 The corresponding archive with all source files and platform files
158 can be obtained [here](http://simgrid.gforge.inria.fr/tutorials/msg-tuto/msg-tuto.tgz).
166 %As you can see, there is already a nice Makefile that compiles
167 everything for you. Now the tiny example has been compiled and it
168 can be easily run as follows:
171 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1
174 If you create a single self-content C-file named foo.c, the
175 corresponding program will be simply compiled and linked with
182 For a more "fancy" output, you can try:
185 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1 | simgrid-colorizer
188 For a really fancy output, you should use [viva/triva][fn:1]:
191 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
192 --cfg=tracing/uncategorized:yes --cfg=viva/uncategorized:uncat.plist
193 LANG=C ; viva simgrid.trace uncat.plist
196 For a more classical Gantt-Chart visualization, you can produce a
200 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
201 --cfg=tracing/msg/process:yes
202 LANG=C ; Paje simgrid.trace
205 Alternatively, you can use [vite][fn:6].
208 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
209 --cfg=tracing/msg/process:yes --cfg=tracing/basic:yes
213 ## Getting Rid of Workers in the Deployment File
215 In the previous example, the deployment file `deployment0.xml`
216 is tightly connected to the platform file `platform.xml` and a
217 worker process is launched on each host:
220 <?xml version='1.0'?>
221 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
222 <platform version="3">
223 <!-- The master process (with some arguments) -->
224 <process host="Tremblay" function="master">
225 <argument value="20"/> <!-- Number of tasks -->
226 <argument value="50000000"/> <!-- Computation size of tasks -->
227 <argument value="1000000"/> <!-- Communication size of tasks -->
228 <argument value="Jupiter"/> <!-- First worker -->
229 <argument value="Fafard"/> <!-- Second worker -->
230 <argument value="Ginette"/> <!-- Third worker -->
231 <argument value="Bourassa"/> <!-- Last worker -->
232 <argument value="Tremblay"/> <!-- Me! I can work too! -->
234 <!-- The worker process (with no argument) -->
235 <process host="Tremblay" function="worker" on_failure="RESTART"/>
236 <process host="Jupiter" function="worker" on_failure="RESTART"/>
237 <process host="Fafard" function="worker" on_failure="RESTART"/>
238 <process host="Ginette" function="worker" on_failure="RESTART"/>
239 <process host="Bourassa" function="worker" on_failure="RESTART"/>
243 This is ok as the platform is rather small but will be painful when
244 using larger platforms. Instead, modify the simulator
245 `masterworker0.c` into `masterworker1.c` so that the master
246 launches a worker process on all the other machines at startup. The
247 new deployment file `deployment1.xml` should thus now simply be:
250 <?xml version='1.0'?>
251 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
252 <platform version="3">
253 <!-- The master process (with some arguments) -->
254 <process host="Tremblay" function="master">
255 <argument value="20"/> <!-- Number of tasks -->
256 <argument value="50000000"/> <!-- Computation size of tasks -->
257 <argument value="1000000"/> <!-- Communication size of tasks -->
262 To this end you may need the following MSG functions (click on the links
263 to see their descriptions):
266 int MSG_get_host_number(void);
267 xbt_dynar_t MSG_hosts_as_dynar(void);
268 void * xbt_dynar_to_array (xbt_dynar_t dynar);
269 msg_process_t MSG_process_create(const char *name, xbt_main_func_t code,
270 void *data, msg_host_t host);
274 It may avoid bugs later to avoid launching a worker on
275 the master host so you probably want to remove it from the host
278 The `data` field of the @ref MSG_process_create can be used to pass
279 a channel name that will be private between master
280 and workers (e.g., `master_name:worker_name`). Adding the
281 `master_name` in the channel name will allow to easily have several
282 masters and a worker per master on each machine. To this end, you
283 may need to use the following functions:
286 msg_host_t MSG_host_self(void);
287 const char * MSG_host_get_name(msg_host_t host);
288 msg_process_t MSG_process_self(void);
289 void * MSG_process_get_data(msg_process_t process);
292 If you are not too familiar with string
293 manipulation in C, you may want to use the following functions
294 (see the C reference for details):
297 char *strcpy(char *dest, const char *src);
298 char *strcat(char *dest, const char *src);
301 ## Setting up a Time Limit Mechanism
303 In the current version, the number of tasks is defined through the
304 worker arguments. Hence, tasks are created at the very beginning of
305 the simulation. Instead, create tasks as needed and provide a time
306 limit indicating when it stops sending tasks. To this end, you will
307 obviously need to know what time it is:
310 double MSG_get_clock(void);
313 Otherwise, a quite effective way of terminating the simulation
314 would be to use some of the following functions:
317 void MSG_process_kill(msg_process_t process);
318 int MSG_process_killall(int reset_PIDs);
321 Anyway, the new deployment `deployment2.xml` file should thus look
325 <?xml version='1.0'?>
326 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
327 <platform version="3">
328 <process host="Tremblay" function="master">
329 <argument value="3600"/> <!-- Simulation timeout -->
330 <argument value="50000000"/> <!-- Computation size of tasks -->
331 <argument value="1000000"/> <!-- Communication size of tasks -->
336 It may also be a good idea to transform most of the `XBT_INFO` into
337 `XBT_DEBUG` (e.g., keep the information on the total number of
338 tasks processed). These debug messages can be activated as follows:
341 ./masterworker2 platforms/platform.xml deployment2.xml --log=msg_test.thres:debug
344 ## Using the Tracing Mechanism
346 SimGrid can trace all resource consumption and the outcome can be
347 displayed with viva as illustrated in the section \ref intro_setup. However, when several
348 masters are deployed, it is hard to understand what happens.
351 <?xml version='1.0'?>
352 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
353 <platform version="3">
354 <process host="Tremblay" function="master">
355 <argument value="3600"/> <!-- Simulation timeout -->
356 <argument value="50000000"/> <!-- Computation size of tasks -->
357 <argument value="10"/> <!-- Communication size of tasks -->
359 <process host="Fafard" function="master">
360 <argument value="3600"/> <!-- Simulation timeout -->
361 <argument value="50000000"/> <!-- Computation size of tasks -->
362 <argument value="10"/> <!-- Communication size of tasks -->
364 <process host="Jupiter" function="master">
365 <argument value="3600"/> <!-- Simulation timeout -->
366 <argument value="50000000"/> <!-- Computation size of tasks -->
367 <argument value="10"/> <!-- Communication size of tasks -->
372 So let's use categories to track more precisely who does what and when:
375 void TRACE_category(const char *category);
376 void MSG_task_set_category (msg_task_t task, const char *category);
379 The outcome can then be visualized as follows:
382 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:yes\
383 --cfg=tracing/categorized:yes --cfg=viva/categorized:viva_cat.plist
384 LANG=C; viva simgrid.trace viva_cat.plist
387 Right now, you should realize that nothing is behaving like you
388 expect. Most workers are idle even though input data are ridiculous
389 and there are several masters deployed on the platform. Using a
390 Gantt-chart visualization may help:
393 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:yes \
394 --cfg=tracing/msg/process:yes
395 LANG=C; Paje simgrid.trace
398 OK, so it should now be obvious that round robin is actually
401 ## Improving the Scheduling
403 Instead of a round-robin scheduling, let's implement a first-come
404 first-served mechanism. To this end, workers need to send a tiny
405 request first. A possible way to implement such a request with MSG
406 is to send on a specific channel (e.g., the name of the master
407 name) a task with payload 0 and whose attached data is the worker
408 name. This way, the master can keep track of which workers are idle
411 To know whether it has pending requests, the master can use the
412 following [function][fn:7]:
415 int MSG_task_listen(const char *alias);
418 If so, it should get the request and push the corresponding host
419 into a dynar so that they can later be retrieved when sending a
423 xbt_dynar_t xbt_dynar_new(const unsigned long elm_size,
424 void_f_pvoid_t const free_f);
425 void xbt_dynar_push(xbt_dynar_t const dynar, const void *src);
426 void xbt_dynar_shift(xbt_dynar_t const dynar, void *const dst);
427 unsigned long xbt_dynar_length(const xbt_dynar_t dynar);
430 %As you will soon realize, with such simple mechanisms, simple
431 deadlocks will soon appear. They can easily be removed with a
432 simple polling mechanism, hence the need for the following
436 msg_error_t MSG_process_sleep(double nb_sec);
439 %As you should quickly realize, on the simple previous example, it
440 will double the throughput of the platform but will be quite
441 ineffective when input size of the tasks is not negligible anymore.
443 From this, many things can easily be added. For example, you could:
444 - add a performance measurement mechanism;
445 - enable the master to make smart scheduling choices using
446 measurement information;
447 - allow workers to have several pending requests so as to overlap
448 communication and computations as much as possible;
451 ## Using More Elaborate Platforms
453 SimGrid offers a rather powerful platform modeling mechanism. The
454 `src/examples/platforms/` repository comprises a variety of platforms ranging
455 from simple to elaborate. Associated to a good
456 visualization tool to ensure your simulation is meaningful, they
457 can allow you to study to which extent your algorithm scales...
459 What is the largest number of tasks requiring 50e6 flops and 1e5
460 bytes that you manage to distribute and process in one hour on
461 `g5k.xml` (you should use `deployment_general.xml`)?
463 \section intro_todo TODO: Points to improve for the next time
465 - Propose equivalent exercises and skeleton in java.
466 - Propose a virtualbox image with everything (simgrid, paje, viva,
468 - Ease the installation on mac OS X (binary installer) and
470 - Explain that programming in C or java and having a working
471 development environment is a prerequisite.
473 [fn:1]: http://triva.gforge.inria.fr/index.html
474 [fn:2]: http://hal.inria.fr/inria-00529569
475 [fn:3]: http://hal.inria.fr/hal-00738321
476 [fn:4]: http://simgrid.gforge.inria.fr/documentation.html
477 [fn:5]: http://paje.sourceforge.net/
478 [fn:6]: http://vite.gforge.inria.fr/