doc/doxygen/FAQ.doc

   1 /*! @page FAQ MSG Frequently Asked Questions
   2
   3 @tableofcontents
   4
   5 This document is the FAQ of the MSG interface. Some entries are a bit aging and it should be refreshed at some point.
   6
   7 @section faq_simgrid I'm new to SimGrid. I have some questions. Where should I start?
   8
   9 You are at the right place... To understand what you can do or
  10 cannot do with SimGrid, you should read the
  11 <a href="https://simgrid.org/tutorials.html">tutorial
  12 slides</a> from the SimGrid's website. You may find more up-to-date
  13 material on the
  14 <a href="http://people.irisa.fr/Martin.Quinson/blog/SimGrid/">blog of
  15 Martin Quinson</a>.
  16
  17 Another great source of inspiration can be found in the @ref s4u_examples.
  18
  19 If you are stuck at any point and if this FAQ cannot help you, please drop us a
  20 mail to the user mailing list: <simgrid-user@lists.gforge.inria.fr>.
  21
  22 @subsection faq_visualization Visualizing and analyzing the results
  23
  24 It is sometime convenient to "see" how the agents are behaving. If you
  25 like colors, you can use <tt>tools/MSG_visualization/colorize.pl </tt>
  26 as a filter to your MSG outputs. It works directly with INFO. Beware,
  27 INFO() prints on stderr. Do not forget to redirect if you want to
  28 filter (e.g. with bash):
  29 @verbatim
  30 ./msg_test small_platform.xml small_deployment.xml 2>&1 | ../../tools/MSG_visualization/colorize.pl
  31 @endverbatim
  32
  33 We also have a more graphical output. Have a look at section @ref options_tracing.
  34
  35 @section faq_howto Feature related questions
  36
  37 @subsection faq_MIA "Could you please add (your favorite feature here) to SimGrid?"
  38
  39 Here is the deal. The whole SimGrid project (MSG, SURF, ...) is
  40 meant to be kept as simple and generic as possible. We cannot add
  41 functions for everybody's needs when these functions can easily be
  42 built from the ones already in the API. Most of the time, it is
  43 possible and when it was not possible we always have upgraded the API
  44 accordingly. When somebody asks us a question like "How to do that?
  45 Is there a function in the API to simply do this?", we're always glad
  46 to answer and help. However if we don't need this code for our own
  47 need, there is no chance we're going to write it... it's your job! :)
  48 The counterpart to our answers is that once you come up with a neat
  49 implementation of this feature (task duplication, RPC, thread
  50 synchronization, ...), you should send it to us and we will be glad to
  51 add it to the distribution. Thus, other people will take advantage of
  52 it (and we don't have to answer this question again and again ;).
  53
  54 You'll find in this section a few "Missing In Action" features. Many
  55 people have asked about it and we have given hints on how to simply do
  56 it with MSG. Feel free to contribute...
  57
  58 @subsection faq_MIA_MSG MSG features
  59
  60 @subsubsection faq_MIA_thread_synchronization How to synchronize my user processes?
  61
  62 It depends on why you want to synchronize them.  If you just want to
  63 have a shared state between your processes, then you probably don't
  64 need to do anything. User processes never get forcefully interrupted
  65 in SimGrid (unless you explicitly request the parallel execution of
  66 user processes -- see @ref options_virt_parallel).
  67
  68 Even if several processes are executed at the exact same time within
  69 the simulation, they are linearized in reality by default: one process
  70 always run in an exclusive manner, atomic, uninterrupted until it does
  71 a simcall (until it ask a service from the simulation kernel). This is
  72 surprising at first, but things are much easier this way, both for the
  73 user (who don't have to protect her shared data) and for the kernel
  74 (that avoid many synchronization issues too). Processes are executed
  75 concurrently in the simulated realm, but you don't need to bother with
  76 this in the real realm.
  77
  78 If you really need to synchronize your processes (because it's what
  79 you are studying or to create an atomic section that spans over
  80 several simcalls), you obviously cannot use regular synchronization
  81 mechanisms (pthread_mutexes in C or the synchronized keyword in Java).
  82 This is because the SimGrid kernel locks all processes and unlock them
  83 one after the other when they are supposed to run, until they give the
  84 control back in their simcall. If one of them gets locked by the OS
  85 before returning the control to the kernel, that's definitively a
  86 deadlock.
  87
  88 Instead, you should use the synchronization mechanism provided by the
  89 simulation kernel. This could with a SimGrid mutex, a SimGrid
  90 condition variables or a SimGrid semaphore, as described in @ref
  91 msg_synchro (in Java, only semaphores are available). But actually,
  92 many synchronization patterns can be encoded with communication on
  93 mailboxes. Typically, if you need one process to notify another one,
  94 you could use a condition variable or a semaphore, but sending a
  95 message to a specific mailbox does the trick in most cases.
  96
  97 @subsubsection faq_MIA_communication_time How can I get the *real* communication time?
  98
  99 Communications are synchronous and thus if you simply get the time
 100 before and after a communication, you'll only get the transmission
 101 time and the time spent to really communicate (it will also take into
 102 account the time spent waiting for the other party to be
 103 ready). However, getting the *real* communication time is not really
 104 hard either. The following solution is a good starting point.
 105
 106 @code
 107 int sender()
 108 {
 109   m_task_t task = MSG_task_create("Task", task_comp_size, task_comm_size,
 110                                   calloc(1,sizeof(double)));
 111   *((double*) task->data) = MSG_get_clock();
 112   MSG_task_put(task, workers[i % workers_count], PORT_22);
 113   XBT_INFO("Send completed");
 114   return 0;
 115 }
 116 int receiver()
 117 {
 118   m_task_t task = NULL;
 119   double time1,time2;
 120
 121   time1 = MSG_get_clock();
 122   a = MSG_task_get(&(task), PORT_22);
 123   time2 = MSG_get_clock();
 124   if(time1<*((double *)task->data))
 125      time1 = *((double *) task->data);
 126   XBT_INFO("Communication time :  \"%f\" ", time2-time1);
 127   free(task->data);
 128   MSG_task_destroy(task);
 129   return 0;
 130 }
 131 @endcode
 132
 133 @subsection faq_MIA_SimDag SimDag related questions
 134
 135 @subsubsection faq_SG_comm Implementing communication delays between tasks.
 136
 137 A classic question of SimDag newcomers is about how to express a
 138 communication delay between tasks. The thing is that in SimDag, both
 139 computation and communication are seen as tasks.  So, if you want to
 140 model a data dependency between two DAG tasks t1 and t2, you have to
 141 create 3 SD_tasks: t1, t2 and c and add dependencies in the following
 142 way:
 143
 144 @code
 145 SD_task_dependency_add(t1, c);
 146 SD_task_dependency_add(c, t2);
 147 @endcode
 148
 149 This way task t2 cannot start before the termination of communication c
 150 which in turn cannot start before t1 ends.
 151
 152 When creating task c, you have to associate an amount of data (in bytes)
 153 corresponding to what has to be sent by t1 to t2.
 154
 155 Finally to schedule the communication task c, you have to build a list
 156 comprising the workstations on which t1 and t2 are scheduled (w1 and w2
 157 for example) and build a communication matrix that should look like
 158 [0;amount ; 0; 0].
 159
 160 @subsubsection faq_SG_DAG How to implement a distributed dynamic scheduler of DAGs.
 161
 162 Distributed is somehow "contagious". If you start making distributed
 163 decisions, there is no way to handle DAGs directly anymore (unless I
 164 am missing something). You have to encode your DAGs in term of
 165 communicating process to make the whole scheduling process
 166 distributed. Here is an example of how you could do that. Assume T1
 167 has to be done before T2.
 168
 169 @code
 170  int your_agent(int argc, char *argv[] {
 171    ...
 172    T1 = MSG_task_create(...);
 173    T2 = MSG_task_create(...);
 174    ...
 175    while(1) {
 176      ...
 177      if(cond) MSG_task_execute(T1);
 178      ...
 179      if((MSG_task_get_remaining_computation(T1)=0.0) && (you_re_in_a_good_mood))
 180         MSG_task_execute(T2)
 181      else {
 182         /* do something else */
 183      }
 184    }
 185  }
 186 @endcode
 187
 188 If you decide that the distributed part is not that much important and that
 189 DAG is really the level of abstraction you want to work with, then you should
 190 give a try to @ref SD_API.
 191
 192 @subsection faq_MIA_generic Generic features
 193
 194 @subsubsection faq_MIA_batch_scheduler Is there a native support for batch schedulers in SimGrid?
 195
 196 No, there is no native support for batch schedulers and none is
 197 planned because this is a very specific need (and doing it in a
 198 generic way is thus very hard). However some people have implemented
 199 their own batch schedulers. Vincent Garonne wrote one during his PhD
 200 and put his code in the contrib directory of our SVN so that other can
 201 keep working on it. You may find inspiring ideas in it.
 202
 203 @subsection faq_platform Platform building and Dynamic resources
 204
 205 @subsubsection faq_platform_synthetic Generating synthetic but realistic platforms
 206
 207 Another possibility to get a platform file is to generate synthetic
 208 platforms. Getting a realistic result is not a trivial task, and
 209 moreover, nobody is really able to define what "realistic" means when
 210 speaking of topology files. You can find some more thoughts on this
 211 topic in these
 212 <a href="http://graal.ens-lyon.fr/~alegrand/articles/Simgrid-Introduction.pdf">slides</a>.
 213
 214 If you are looking for an actual tool, there we have a little tool to
 215 annotate Tiers-generated topologies. This perl-script is in
 216 <tt>tools/platform_generation/</tt> directory of the SVN. Dinda et Al.
 217 released a very comparable tool, and called it GridG.
 218
 219
 220 The specified computing power will be available to up to 6 sequential
 221 tasks without sharing. If more tasks are placed on this host, the
 222 resource will be shared accordingly. For example, if you schedule 12
 223 tasks on the host, each will get half of the computing power. Please
 224 note that although sound, this model were never scientifically
 225 assessed. Please keep this fact in mind when using it.
 226
 227 @section faq_troubleshooting Troubleshooting
 228
 229 @subsection faq_surf_network_latency I get weird timings when I play with the latencies.
 230
 231 OK, first of all, remember that units should be Bytes, Flops and
 232 Seconds. If you don't use such units, some SimGrid constants (e.g. the
 233 SG_TCP_CTE_GAMMA constant used in most network models) won't have the
 234 right unit and you'll end up with weird results.
 235
 236 Here is what happens with a single transfer of size L on a link
 237 (bw,lat) when nothing else happens.
 238
 239 @verbatim
 240 0-----lat--------------------------------------------------t
 241 |-----|**** real_bw =min(bw,SG_TCP_CTE_GAMMA/(2*lat)) *****|
 242 @endverbatim
 243
 244 In more complex situations, this min is the solution of a complex
 245 max-min linear system.  Have a look
 246 <a href="http://lists.gforge.inria.fr/pipermail/simgrid-devel/2006-April/thread.html">here</a>
 247 and read the two threads "Bug in SURF?" and "Surf bug not
 248 fixed?". You'll have a few other examples of such computations. You
 249 can also read "A Network Model for Simulation of Grid Application" by
 250 Henri Casanova and Loris Marchal to have all the details. The fact
 251 that the real_bw is smaller than bw is easy to understand. The fact
 252 that real_bw is smaller than SG_TCP_CTE_GAMMA/(2*lat) is due to the
 253 window-based congestion mechanism of TCP. With TCP, you can't exploit
 254 your huge network capacity if you don't have a good round-trip-time
 255 because of the acks...
 256
 257 Anyway, what you get is t=lat + L/min(bw,SG_TCP_CTE_GAMMA/(2*lat)).
 258
 259   * if I you set (bw,lat)=(100 000 000, 0.00001), you get t =  1.00001 (you fully
 260 use your link)
 261   * if I you set (bw,lat)=(100 000 000, 0.0001),  you get t =  1.0001 (you're on the
 262 limit)
 263   * if I you set (bw,lat)=(100 000 000, 0.001),   you get t = 10.001  (ouch!)
 264
 265 This bound on the effective bandwidth of a flow is not the only thing
 266 that may make your result be unexpected. For example, two flows
 267 competing on a saturated link receive an amount of bandwidth inversely
 268 proportional to their round trip time.
 269
 270 */