1 /*! @page tutorial_msg SimGrid Tutorial with MSG
3 SimGrid is a toolkit providing the core functionalities for the
4 simulation of distributed applications in heterogeneous distributed
7 The project goal is both to facilitate research and to help improving
8 real applications in the area of distributed and parallel systems,
9 ranging from simple network of workstations to Computational Grids to
10 Clouds and to supercomputers.
15 The goal of this practical session is to illustrate various usage of
16 the MSG interface. To this end we will use the following simple setting:
18 > Assume we have a (possibly large) bunch of (possibly large) data to
19 > process and which originally reside on a server (a.k.a. master). For
20 > sake of simplicity, we assume all input file require the same amount
21 > of computation. We assume the server can be helped by a (possibly
22 > large) set of worker machines. What is the best way to organize the
25 Although this looks like a very simple setting it raises several
26 interesting questions:
28 - Which algorithm should the master use to send workload?
30 The most obvious algorithm would be to send tasks to workers in a
31 round-robin fashion. This is the initial code we provide you.
33 A less obvious but probably more efficient approach would be to set up
34 a request mechanism where a client first ask for tasks, which allows
35 the server to decide which request to answer and possibly to send
36 the tasks to the fastest machines. Maybe you can think of a
39 - How many tasks should the client ask for?
41 Indeed, if we set up a request mechanism so that workers only
42 send request whenever they have no more task to process, they are
43 likely to be poorly exploited since they will have to wait for the
44 master to consider their request and for the input data to be
45 transferred. A client should thus probably request a pool of tasks
46 but if it requests too many tasks, it is likely to lead to a poor
49 - How is the quality of such algorithm dependent on the platform
50 characteristics and on the task characteristics?
52 Whenever the input communication time is very small compared to
53 processing time and workers are homogeneous, it is likely that the
54 round-robin algorithm performs very well. Would it still hold true
55 when transfer time is not negligible and the platform is, say,
56 a volunteer computing system ?
58 - The network topology interconnecting the master and the workers
59 may be quite complicated. How does such a topology impact the
62 When data transfers are the bottleneck, it is likely that a good
63 modeling of the platform becomes essential. In this case, you may
64 want to be able to account for complex platform topologies.
66 - Do the algorithms depend on a perfect knowledge of this
69 Should we still use a flat master worker deployment or should we
72 - How is such an algorithm sensitive to external workload variation?
74 What if bandwidth, latency and power can vary with no warning?
75 Shouldn't you study whether your algorithm is sensitive to such
78 - Although an algorithm may be more efficient than another, how
79 does it interfere with other applications?
81 As you can see, this very simple setting may need to evolve way
82 beyond what you initially imagined.
84 <blockquote> Premature optimization is the root of all evil. -- D.E.Knuth</blockquote>
86 Furthermore, writing your own simulator is much harder than you
87 may imagine. This is why you should rely on an established and flexible
90 The following figure is a screenshot of [triva][fn:1] visualizing a [SimGrid
91 simulation][fn:2] of two master worker applications (one in light gray and
92 the other in dark gray) running in concurrence and showing resource
93 usage over a long period of time.
95 ![Test](./sc3-description.png)
97 \section Prerequisites
99 Of course, you need to install SimGrid before taking this tutorial.
100 Please refer to the relevant Section: \ref install.
104 A lot of information on how to install and use Simgrid are
105 provided by the [online documentation][fn:4] and by several tutorials:
107 - http://simgrid.gforge.inria.fr/tutorials/simgrid-use-101.pdf
108 - http://simgrid.gforge.inria.fr/tutorials/simgrid-tracing-101.pdf
109 - http://simgrid.gforge.inria.fr/tutorials/simgrid-platf-101.pdf
111 ## Installing the visualization softwares
113 Several tools can be used to visualize the result of SimGrid
114 simulations and get a better understanding of simulations.
116 - [pajeng][fn:5] provides a Gantt-chart visualization.
117 - [Vite][fn:6] also provides a Gantt-chart visualization.
119 Under Debian or Ubuntu, this is really easy with apt-get, while you
120 may have to install from the source on other systems. Check the
121 documentation of each software for more details.
124 sudo apt-get install pajeng vite
127 \section intro_start Let's get started
130 ## Setting up and Compiling
132 The corresponding source files can be obtained
133 [online on GitLab](https://gitlab.inria.fr/simgrid/simgrid/tree/master/doc/msg-tuto-src).
134 If you find the right button on the top right of the interface, you can download the whole
135 directory in one archive file. If you wish, you can find other platform file in
136 [this GitLab directory](https://gitlab.inria.fr/simgrid/simgrid/tree/master/examples/platforms).
138 As you can see, there is already a little Makefile that compiles
139 everything for you. If you struggle with the compilation, then you should double check
140 your @ref install "SimGrid installation".
141 On need, please refer to the @ref install_yours_trouble section.
143 Once the tiny example has been compiled and it can be easily run as follows:
146 ./masterworker0 platforms/platform.xml deployment0.xml
149 For a more "fancy" output, you can use simgrid-colorizer.
152 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1 | simgrid-colorizer
155 If you installed SimGrid to a non-standard path, you may have to
156 specify the full path to simgrid-colorizer on the above line, such as
157 \c /opt/simgrid/bin/simgrid-colorizer. If you did not install it at all,
158 you can find it in <simgrid_root_directory>/bin/colorize.
160 For a classical Gantt-Chart visualization, you can produce a [Paje][fn:5] trace:
163 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
164 --cfg=tracing/msg/process:yes
168 Alternatively, you can use [vite][fn:6].
171 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
172 --cfg=tracing/msg/process:yes --cfg=tracing/basic:yes
176 ## Getting Rid of Workers in the Deployment File
178 In the previous example, the deployment file `deployment0.xml`
179 is tightly connected to the platform file `platform.xml` and a
180 worker process is launched on each host:
183 <?xml version='1.0'?>
184 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
185 <platform version="3">
186 <!-- The master process (with some arguments) -->
187 <process host="Tremblay" function="master">
188 <argument value="20"/> <!-- Number of tasks -->
189 <argument value="50000000"/> <!-- Computation size of tasks -->
190 <argument value="1000000"/> <!-- Communication size of tasks -->
191 <argument value="Jupiter"/> <!-- First worker -->
192 <argument value="Fafard"/> <!-- Second worker -->
193 <argument value="Ginette"/> <!-- Third worker -->
194 <argument value="Bourassa"/> <!-- Last worker -->
195 <argument value="Tremblay"/> <!-- Me! I can work too! -->
197 <!-- The worker process (with no argument) -->
198 <process host="Tremblay" function="worker" on_failure="RESTART"/>
199 <process host="Jupiter" function="worker" on_failure="RESTART"/>
200 <process host="Fafard" function="worker" on_failure="RESTART"/>
201 <process host="Ginette" function="worker" on_failure="RESTART"/>
202 <process host="Bourassa" function="worker" on_failure="RESTART"/>
206 This is ok as the platform is rather small but will be painful when
207 using larger platforms. Instead, modify the simulator
208 `masterworker0.c` into `masterworker1.c` so that the master
209 launches a worker process on all the other machines at startup. The
210 new deployment file `deployment1.xml` should thus now simply be:
213 <?xml version='1.0'?>
214 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
215 <platform version="3">
216 <!-- The master process (with some arguments) -->
217 <process host="Tremblay" function="master">
218 <argument value="20"/> <!-- Number of tasks -->
219 <argument value="50000000"/> <!-- Computation size of tasks -->
220 <argument value="1000000"/> <!-- Communication size of tasks -->
225 To this end you may need the following MSG functions (click on the links
226 to see their descriptions):
229 int MSG_get_host_number(void);
230 xbt_dynar_t MSG_hosts_as_dynar(void);
231 void * xbt_dynar_to_array (xbt_dynar_t dynar);
232 msg_process_t MSG_process_create(const char *name, xbt_main_func_t code,
233 void *data, msg_host_t host);
237 It may avoid bugs later to avoid launching a worker on
238 the master host so you probably want to remove it from the host
241 The `data` field of the @ref MSG_process_create can be used to pass
242 a channel name that will be private between master
243 and workers (e.g., `master_name:worker_name`). Adding the
244 `master_name` in the channel name will allow to easily have several
245 masters and a worker per master on each machine. To this end, you
246 may need to use the following functions:
249 msg_host_t MSG_host_self(void);
250 const char * MSG_host_get_name(msg_host_t host);
251 msg_process_t MSG_process_self(void);
252 void * MSG_process_get_data(msg_process_t process);
255 If you are not too familiar with string
256 manipulation in C, you may want to use the following functions
257 (see the C reference for details):
260 char *strcpy(char *dest, const char *src);
261 char *strcat(char *dest, const char *src);
264 ## Setting up a Time Limit Mechanism
266 In the current version, the number of tasks is defined through the
267 worker arguments. Hence, tasks are created at the very beginning of
268 the simulation. Instead, create tasks as needed and provide a time
269 limit indicating when it stops sending tasks. To this end, you will
270 obviously need to know what time it is:
273 double MSG_get_clock(void);
276 Otherwise, a quite effective way of terminating the simulation
277 would be to use some of the following functions:
280 void MSG_process_kill(msg_process_t process);
281 int MSG_process_killall(int reset_PIDs);
284 Anyway, the new deployment `deployment2.xml` file should thus look
288 <?xml version='1.0'?>
289 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
290 <platform version="3">
291 <process host="Tremblay" function="master">
292 <argument value="3600"/> <!-- Simulation timeout -->
293 <argument value="50000000"/> <!-- Computation size of tasks -->
294 <argument value="1000000"/> <!-- Communication size of tasks -->
299 It may also be a good idea to transform most of the `XBT_INFO` into
300 `XBT_DEBUG` (e.g., keep the information on the total number of
301 tasks processed). These debug messages can be activated as follows:
304 ./masterworker2 platforms/platform.xml deployment2.xml --log=msg_test.thres:debug
307 ## Using the Tracing Mechanism
309 SimGrid can trace all resource consumption and the outcome can be
310 displayed as illustrated in the section \ref intro_setup. However, when several
311 masters are deployed, it is hard to understand what happens.
314 <?xml version='1.0'?>
315 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
316 <platform version="3">
317 <process host="Tremblay" function="master">
318 <argument value="3600"/> <!-- Simulation timeout -->
319 <argument value="50000000"/> <!-- Computation size of tasks -->
320 <argument value="10"/> <!-- Communication size of tasks -->
322 <process host="Fafard" function="master">
323 <argument value="3600"/> <!-- Simulation timeout -->
324 <argument value="50000000"/> <!-- Computation size of tasks -->
325 <argument value="10"/> <!-- Communication size of tasks -->
327 <process host="Jupiter" function="master">
328 <argument value="3600"/> <!-- Simulation timeout -->
329 <argument value="50000000"/> <!-- Computation size of tasks -->
330 <argument value="10"/> <!-- Communication size of tasks -->
335 So let's use categories to track more precisely who does what and when:
338 void TRACE_category(const char *category);
339 void MSG_task_set_category (msg_task_t task, const char *category);
342 The outcome can then be visualized as a Gantt-chart as follows:
345 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:yes \
346 --cfg=tracing/msg/process:yes
350 Right now, you should realize that nothing is behaving like you expect. Most
351 workers are idle even though input data are ridiculous and there are several
352 masters deployed on the platform. So it should now be obvious that round robin
353 is actually very bad.
355 ## Improving the Scheduling
357 Instead of a round-robin scheduling, let's implement a first-come
358 first-served mechanism. To this end, workers need to send a tiny
359 request first. A possible way to implement such a request with MSG
360 is to send on a specific channel (e.g., the name of the master
361 name) a task with payload 0 and whose attached data is the worker
362 name. This way, the master can keep track of which workers are idle
365 To know whether it has pending requests, the master can use the
366 following [function][fn:7]:
369 int MSG_task_listen(const char *alias);
372 If so, it should get the request and push the corresponding host
373 into a dynar so that they can later be retrieved when sending a
377 xbt_dynar_t xbt_dynar_new(const unsigned long elm_size,
378 void_f_pvoid_t const free_f);
379 void xbt_dynar_push(xbt_dynar_t const dynar, const void *src);
380 void xbt_dynar_shift(xbt_dynar_t const dynar, void *const dst);
381 unsigned long xbt_dynar_length(const xbt_dynar_t dynar);
384 As you will soon realize, with such simple mechanisms, simple
385 deadlocks will soon appear. They can easily be removed with a
386 simple polling mechanism, hence the need for the following
390 msg_error_t MSG_process_sleep(double nb_sec);
393 As you should quickly realize, on the simple previous example, it
394 will double the throughput of the platform but will be quite
395 ineffective when input size of the tasks is not negligible anymore.
397 From this, many things can easily be added. For example, you could:
398 - add a performance measurement mechanism;
399 - enable the master to make smart scheduling choices using
400 measurement information;
401 - allow workers to have several pending requests so as to overlap
402 communication and computations as much as possible;
405 ## Using More Elaborate Platforms
407 SimGrid offers a rather powerful platform modeling mechanism. The
408 `src/examples/platforms/` repository comprises a variety of platforms ranging
409 from simple to elaborate. Associated to a good
410 visualization tool to ensure your simulation is meaningful, they
411 can allow you to study to which extent your algorithm scales...
413 What is the largest number of tasks requiring 50e6 flops and 1e5
414 bytes that you manage to distribute and process in one hour on
415 `g5k.xml` (you should use `deployment_general.xml`)?
417 \section intro_todo TODO: Points to improve for the next time
419 - Propose equivalent exercises and skeleton in java.
420 - Propose a virtualbox image with everything (simgrid, pajeng, ...) already set
422 - Ease the installation on mac OS X (binary installer) and
424 - Explain that programming in C or java and having a working
425 development environment is a prerequisite.
427 [fn:1]: http://triva.gforge.inria.fr/index.html
428 [fn:2]: http://hal.inria.fr/inria-00529569
429 [fn:3]: http://hal.inria.fr/hal-00738321
430 [fn:4]: http://simgrid.gforge.inria.fr/simgrid/latest/doc/
431 [fn:5]: https://github.com/schnorr/pajeng/
432 [fn:6]: http://vite.gforge.inria.fr/