1 /*! @page FAQ MSG Frequently Asked Questions
5 This document is the FAQ of the MSG interface. Some entries are a bit aging and it should be refreshed at some point.
7 @section faq_simgrid I'm new to SimGrid. I have some questions. Where should I start?
9 You are at the right place... To understand what you can do or
10 cannot do with SimGrid, you should read the
11 <a href="https://simgrid.org/tutorials.html">tutorial
12 slides</a> from the SimGrid's website. You may find more up-to-date
14 <a href="http://people.irisa.fr/Martin.Quinson/blog/SimGrid/">blog of
17 Another great source of inspiration can be found in the @ref s4u_examples.
19 If you are stuck at any point and if this FAQ cannot help you, please drop us a
20 mail to the user mailing list: <simgrid-user@lists.gforge.inria.fr>.
22 @subsection faq_visualization Visualizing and analyzing the results
24 It is sometime convenient to "see" how the agents are behaving. If you
25 like colors, you can use <tt>tools/MSG_visualization/colorize.pl </tt>
26 as a filter to your MSG outputs. It works directly with INFO. Beware,
27 INFO() prints on stderr. Do not forget to redirect if you want to
28 filter (e.g. with bash):
30 ./msg_test small_platform.xml small_deployment.xml 2>&1 | ../../tools/MSG_visualization/colorize.pl
33 We also have a more graphical output. Have a look at section @ref options_tracing.
35 @section faq_howto Feature related questions
37 @subsection faq_MIA "Could you please add (your favorite feature here) to SimGrid?"
39 Here is the deal. The whole SimGrid project (MSG, SURF, ...) is
40 meant to be kept as simple and generic as possible. We cannot add
41 functions for everybody's needs when these functions can easily be
42 built from the ones already in the API. Most of the time, it is
43 possible and when it was not possible we always have upgraded the API
44 accordingly. When somebody asks us a question like "How to do that?
45 Is there a function in the API to simply do this?", we're always glad
46 to answer and help. However if we don't need this code for our own
47 need, there is no chance we're going to write it... it's your job! :)
48 The counterpart to our answers is that once you come up with a neat
49 implementation of this feature (task duplication, RPC, thread
50 synchronization, ...), you should send it to us and we will be glad to
51 add it to the distribution. Thus, other people will take advantage of
52 it (and we don't have to answer this question again and again ;).
54 You'll find in this section a few "Missing In Action" features. Many
55 people have asked about it and we have given hints on how to simply do
56 it with MSG. Feel free to contribute...
58 @subsection faq_MIA_MSG MSG features
60 @subsubsection faq_MIA_thread_synchronization How to synchronize my user processes?
62 It depends on why you want to synchronize them. If you just want to
63 have a shared state between your processes, then you probably don't
64 need to do anything. User processes never get forcefully interrupted
65 in SimGrid (unless you explicitly request the parallel execution of
66 user processes -- see @ref options_virt_parallel).
68 Even if several processes are executed at the exact same time within
69 the simulation, they are linearized in reality by default: one process
70 always run in an exclusive manner, atomic, uninterrupted until it does
71 a simcall (until it ask a service from the simulation kernel). This is
72 surprising at first, but things are much easier this way, both for the
73 user (who don't have to protect her shared data) and for the kernel
74 (that avoid many synchronization issues too). Processes are executed
75 concurrently in the simulated realm, but you don't need to bother with
76 this in the real realm.
78 If you really need to synchronize your processes (because it's what
79 you are studying or to create an atomic section that spans over
80 several simcalls), you obviously cannot use regular synchronization
81 mechanisms (pthread_mutexes in C or the synchronized keyword in Java).
82 This is because the SimGrid kernel locks all processes and unlock them
83 one after the other when they are supposed to run, until they give the
84 control back in their simcall. If one of them gets locked by the OS
85 before returning the control to the kernel, that's definitively a
88 Instead, you should use the synchronization mechanism provided by the
89 simulation kernel. This could with a SimGrid mutex, a SimGrid
90 condition variables or a SimGrid semaphore, as described in @ref
91 msg_synchro (in Java, only semaphores are available). But actually,
92 many synchronization patterns can be encoded with communication on
93 mailboxes. Typically, if you need one process to notify another one,
94 you could use a condition variable or a semaphore, but sending a
95 message to a specific mailbox does the trick in most cases.
97 @subsubsection faq_MIA_communication_time How can I get the *real* communication time?
99 Communications are synchronous and thus if you simply get the time
100 before and after a communication, you'll only get the transmission
101 time and the time spent to really communicate (it will also take into
102 account the time spent waiting for the other party to be
103 ready). However, getting the *real* communication time is not really
104 hard either. The following solution is a good starting point.
109 m_task_t task = MSG_task_create("Task", task_comp_size, task_comm_size,
110 calloc(1,sizeof(double)));
111 *((double*) task->data) = MSG_get_clock();
112 MSG_task_put(task, workers[i % workers_count], PORT_22);
113 XBT_INFO("Send completed");
118 m_task_t task = NULL;
121 time1 = MSG_get_clock();
122 a = MSG_task_get(&(task), PORT_22);
123 time2 = MSG_get_clock();
124 if(time1<*((double *)task->data))
125 time1 = *((double *) task->data);
126 XBT_INFO("Communication time : \"%f\" ", time2-time1);
128 MSG_task_destroy(task);
133 @subsection faq_MIA_SimDag SimDag related questions
135 @subsubsection faq_SG_comm Implementing communication delays between tasks.
137 A classic question of SimDag newcomers is about how to express a
138 communication delay between tasks. The thing is that in SimDag, both
139 computation and communication are seen as tasks. So, if you want to
140 model a data dependency between two DAG tasks t1 and t2, you have to
141 create 3 SD_tasks: t1, t2 and c and add dependencies in the following
145 SD_task_dependency_add(t1, c);
146 SD_task_dependency_add(c, t2);
149 This way task t2 cannot start before the termination of communication c
150 which in turn cannot start before t1 ends.
152 When creating task c, you have to associate an amount of data (in bytes)
153 corresponding to what has to be sent by t1 to t2.
155 Finally to schedule the communication task c, you have to build a list
156 comprising the workstations on which t1 and t2 are scheduled (w1 and w2
157 for example) and build a communication matrix that should look like
160 @subsubsection faq_SG_DAG How to implement a distributed dynamic scheduler of DAGs.
162 Distributed is somehow "contagious". If you start making distributed
163 decisions, there is no way to handle DAGs directly anymore (unless I
164 am missing something). You have to encode your DAGs in term of
165 communicating process to make the whole scheduling process
166 distributed. Here is an example of how you could do that. Assume T1
167 has to be done before T2.
170 int your_agent(int argc, char *argv[] {
172 T1 = MSG_task_create(...);
173 T2 = MSG_task_create(...);
177 if(cond) MSG_task_execute(T1);
179 if((MSG_task_get_remaining_computation(T1)=0.0) && (you_re_in_a_good_mood))
182 /* do something else */
188 If you decide that the distributed part is not that much important and that
189 DAG is really the level of abstraction you want to work with, then you should
190 give a try to @ref SD_API.
192 @subsection faq_MIA_generic Generic features
194 @subsubsection faq_MIA_batch_scheduler Is there a native support for batch schedulers in SimGrid?
196 No, there is no native support for batch schedulers and none is
197 planned because this is a very specific need (and doing it in a
198 generic way is thus very hard). However some people have implemented
199 their own batch schedulers. Vincent Garonne wrote one during his PhD
200 and put his code in the contrib directory of our SVN so that other can
201 keep working on it. You may find inspiring ideas in it.
203 @subsection faq_platform Platform building and Dynamic resources
205 @subsubsection faq_platform_example Where can I find SimGrid platform files?
207 There are several little examples in the archive, in the examples/platforms
208 directory. From time to time, we are asked for other files, but we
209 don't have much at hand right now.
211 You should refer to the Platform Description Archive
212 (http://pda.gforge.inria.fr) project to see the other platform file we
213 have available, as well as the Simulacrum simulator, meant to generate
214 SimGrid platforms using all classical generation algorithms.
216 @subsubsection faq_platform_synthetic Generating synthetic but realistic platforms
218 Another possibility to get a platform file is to generate synthetic
219 platforms. Getting a realistic result is not a trivial task, and
220 moreover, nobody is really able to define what "realistic" means when
221 speaking of topology files. You can find some more thoughts on this
223 <a href="http://graal.ens-lyon.fr/~alegrand/articles/Simgrid-Introduction.pdf">slides</a>.
225 If you are looking for an actual tool, there we have a little tool to
226 annotate Tiers-generated topologies. This perl-script is in
227 <tt>tools/platform_generation/</tt> directory of the SVN. Dinda et Al.
228 released a very comparable tool, and called it GridG.
231 The specified computing power will be available to up to 6 sequential
232 tasks without sharing. If more tasks are placed on this host, the
233 resource will be shared accordingly. For example, if you schedule 12
234 tasks on the host, each will get half of the computing power. Please
235 note that although sound, this model were never scientifically
236 assessed. Please keep this fact in mind when using it.
238 @section faq_troubleshooting Troubleshooting
240 @subsection faq_trouble_compil User code compilation problems
242 @subsubsection faq_trouble_err_logcat "gcc: _simgrid_this_log_category_does_not_exist__??? undeclared (first use in this function)"
244 This is because you are using the log mechanism, but you didn't created
245 any default category in this file. You should refer to @ref XBT_log
246 for all the details, but you simply forgot to call one of
247 XBT_LOG_NEW_DEFAULT_CATEGORY() or XBT_LOG_NEW_DEFAULT_SUBCATEGORY().
249 @subsection faq_trouble_errors Runtime error messages
251 @subsubsection faq_trouble_errors_big_fat_warning I'm told that my XML files are too old.
253 The format of the XML platform description files is sometimes
254 improved. For example, we decided to change the units used in SimGrid
255 from MBytes, MFlops and seconds to Bytes, Flops and seconds to ease
256 people exchanging small messages. We also reworked the route
257 descriptions to allow more compact descriptions.
259 That is why the XML files are versioned using the 'version' attribute
260 of the root tag. Currently, it should read:
262 <platform version="4">
265 If your files are too old, you can use the simgrid_update_xml.pl
266 script which can be found in the tools directory of the archive.
268 @subsection faq_trouble_debug Debugging SMPI applications
270 In order to debug SMPI programs, you can use the following options:
272 - <b>-wrapper 'gdb --args'</b>: this option is used to use a wrapper
273 in order to call the SMPI process. Good candidates for this options
274 are "gdb --args", "valgrind", "rr record", "strace", etc;
276 - <b>-foreground</b>: this options gives the debugger access to the terminal
277 which is needed in order to use an interactive debugger.
279 Both options are needed in order to run the SMPI process under GDB.
281 @subsection faq_deadlock There is a deadlock in my code!!!
283 Unfortunately, we cannot debug every code written in SimGrid. We
284 furthermore believe that the framework provides ways enough
285 information to debug such information yourself. If the textual output
286 is not enough, Make sure to check the @ref faq_visualization FAQ entry to see
287 how to get a graphical one.
289 Now, if you come up with a really simple example that deadlocks and
290 you're absolutely convinced that it should not, you can ask on the
291 list. Just be aware that you'll be severely punished if the mistake is
292 on your side... We have plenty of FAQ entries to redact and new
293 features to implement for the impenitents! ;)
295 @subsection faq_surf_network_latency I get weird timings when I play with the latencies.
297 OK, first of all, remember that units should be Bytes, Flops and
298 Seconds. If you don't use such units, some SimGrid constants (e.g. the
299 SG_TCP_CTE_GAMMA constant used in most network models) won't have the
300 right unit and you'll end up with weird results.
302 Here is what happens with a single transfer of size L on a link
303 (bw,lat) when nothing else happens.
306 0-----lat--------------------------------------------------t
307 |-----|**** real_bw =min(bw,SG_TCP_CTE_GAMMA/(2*lat)) *****|
310 In more complex situations, this min is the solution of a complex
311 max-min linear system. Have a look
312 <a href="http://lists.gforge.inria.fr/pipermail/simgrid-devel/2006-April/thread.html">here</a>
313 and read the two threads "Bug in SURF?" and "Surf bug not
314 fixed?". You'll have a few other examples of such computations. You
315 can also read "A Network Model for Simulation of Grid Application" by
316 Henri Casanova and Loris Marchal to have all the details. The fact
317 that the real_bw is smaller than bw is easy to understand. The fact
318 that real_bw is smaller than SG_TCP_CTE_GAMMA/(2*lat) is due to the
319 window-based congestion mechanism of TCP. With TCP, you can't exploit
320 your huge network capacity if you don't have a good round-trip-time
321 because of the acks...
323 Anyway, what you get is t=lat + L/min(bw,SG_TCP_CTE_GAMMA/(2*lat)).
325 * if I you set (bw,lat)=(100 000 000, 0.00001), you get t = 1.00001 (you fully
327 * if I you set (bw,lat)=(100 000 000, 0.0001), you get t = 1.0001 (you're on the
329 * if I you set (bw,lat)=(100 000 000, 0.001), you get t = 10.001 (ouch!)
331 This bound on the effective bandwidth of a flow is not the only thing
332 that may make your result be unexpected. For example, two flows
333 competing on a saturated link receive an amount of bandwidth inversely
334 proportional to their round trip time.
336 @subsection faq_bugrepport So I've found a bug in SimGrid. How to report it?
338 We do our best to make sure to hammer away any bugs of SimGrid, but this is
339 still an academic project so please be patient if/when you find bugs in it.
340 If you do, the best solution is to drop an email either on the simgrid-user
341 or the simgrid-devel mailing list and explain us about the issue. You can
342 also decide to open a formal bug report using the
343 <a href="https://framagit.org/simgrid/simgrid/issues">relevant
344 interface</a>. You need to login on the server to get the ability to submit
347 We will do our best to solve any problem reported, but you need to help us
348 finding the issue. Just telling "it segfault" isn't enough. Telling "It
349 segfaults when running the attached simulator" doesn't really help either.
350 You may find the following article interesting to see how to report
351 informative bug repports:
352 http://www.chiark.greenend.org.uk/~sgtatham/bugs.html (it is not SimGrid
353 specific at all, but it's full of good advices).