doc/doxygen/uhood_switch.doc

   1 /*! @page uhood_switch Process Synchronizations and Context Switching
   2
   3 @section uhood_switch_DES SimGrid as an Operating System
   4
   5 SimGrid is a discrete event simulator of distributed systems: it does
   6 not simulate the world by small fixed-size steps but determines the
   7 date of the next event (such as the end of a communication, the end of
   8 a computation) and jumps to this date.
   9
  10 A number of actors executing user-provided code run on top of the
  11 simulation kernel. The interactions between these actors and the
  12 simulation kernel are very similar to the ones between the system
  13 processes and the Operating System (except that the actors and
  14 simulation kernel share the same address space in a single OS
  15 process).
  16
  17 When an actor needs to interact with the outer world (eg. to start a
  18 communication), it issues a <i>simcall</i> (simulation call), just
  19 like a system process issues a <i>syscall</i> to interact with its
  20 environment through the Operating System. Any <i>simcall</i> freezes
  21 the actor until it is woken up by the simulation kernel (eg. when the
  22 communication is finished).
  23
  24 Mimicking the OS behavior may seem over-engineered here, but this is
  25 mandatory to the model-checker. The simcalls, representing actors'
  26 actions, are the transitions of the formal system. Verifying the
  27 system requires to manipulate these transitions explicitly. This also
  28 allows to run safely the actors in parallel, even if this is less
  29 commonly used by our users.
  30
  31 So, the key ideas here are:
  32
  33  - The simulator is a discrete event simulator (event-driven).
  34
  35  - An actor can issue a blocking simcall and will be suspended until
  36    it is woken up by the simulation kernel (when the operation is
  37    completed).
  38
  39  - In order to move forward in (simulated) time, the simulation kernel
  40    needs to know which actions the actors want to do.
  41
  42  - The simulated time will only move forward when all the actors are
  43    blocked, waiting on a simcall.
  44
  45 This leads to some very important consequences:
  46
  47  - An actor cannot synchronize with another actor using OS-level primitives
  48    such as `pthread_mutex_lock()` or `std::mutex`. The simulation kernel
  49    would wait for the actor to issue a simcall and would deadlock. Instead it
  50    must use simulation-level synchronization primitives
  51    (such as `simcall_mutex_lock()`).
  52
  53  - Similarly, an actor cannot sleep using
  54    `std::this_thread::sleep_for()` which waits in the real world but
  55    must instead wait in the simulation with
  56    `simgrid::s4u::Actor::this_actor::sleep_for()` which waits in the
  57    simulation.
  58
  59  - The simulation kernel cannot block.
  60    Only the actors can block (using simulation primitives).
  61
  62 @section uhood_switch_futures Futures and Promises
  63
  64 @subsection uhood_switch_futures_what What is a future?
  65
  66 Futures are a nice classical programming abstraction, present in many
  67 language.  Wikipedia defines a
  68 [future](https://en.wikipedia.org/wiki/Futures_and_promises) as an
  69 object that acts as a proxy for a result that is initially unknown,
  70 usually because the computation of its value is yet incomplete. This
  71 concept is thus perfectly adapted to represent in the kernel the
  72 asynchronous operations corresponding to the actors' simcalls.
  73
  74
  75 Futures can be manipulated using two kind of APIs:
  76
  77  - a <b>blocking API</b> where we wait for the result to be available
  78    (`res = f.get()`);
  79
  80  - a <b>continuation-based API</b> where we say what should be done
  81    with the result when the operation completes
  82    (`future.then(something_to_do_with_the_result)`). This is heavily
  83    used in ECMAScript that exhibits the same kind of never-blocking
  84    asynchronous model as our discrete event simulator.
  85
  86 C++11 includes a generic class (`std::future<T>`) which implements a
  87 blocking API.  The continuation-based API is not available in the
  88 standard (yet) but is [already
  89 described](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0159r0.html#futures.unique_future.6)
  90 in the Concurrency Technical Specification.
  91
  92 `Promise`s are the counterparts of `Future`s: `std::future<T>` is used
  93 <em>by the consumer</em> of the result. On the other hand,
  94 `std::promise<T>` is used <em>by the producer</em> of the result. The
  95 producer calls `promise.set_value(42)` or `promise.set_exception(e)`
  96 in order to <em>set the result</em> which will be made available to
  97 the consumer by `future.get()`.
  98
  99 ### Which future do we need?
 100
 101 The blocking API provided by the standard C++11 futures does not suit
 102 our needs since the simulation kernel <em>cannot</em> block, and since
 103 we want to explicitly schedule the actors.  Instead, we need to
 104 reimplement a continuation-based API to be used in our event-driven
 105 simulation kernel.
 106
 107 Our futures are based on the C++ Concurrency Technical Specification
 108 API, with a few differences:
 109
 110  - The simulation kernel is single-threaded so we do not need
 111    inter-thread synchronization for our futures.
 112
 113  - As the simulation kernel cannot block, `f.wait()` is not meaningful
 114    in this context.
 115
 116  - Similarly, `future.get()` does an implicit wait. Calling this method in the
 117    simulation kernel only makes sense if the future is already ready. If the
 118    future is not ready, this would deadlock the simulator and an error is
 119    raised instead.
 120
 121  - We always call the continuations in the simulation loop (and not
 122    inside the `future.then()` or `promise.set_value()` calls). That
 123    way, we don't have to fear problems like invariants not being
 124    restored when the callbacks are called :fearful: or stack overflows
 125    triggered by deeply nested continuations chains :cold_sweat:. The
 126    continuations are all called in a nice and predictable place in the
 127    simulator with a nice and predictable state :relieved:.
 128
 129  - Some features of the standard (such as shared futures) are not
 130    needed in our context, and thus not considered here.
 131
 132 ### Implementing `Future` and `Promise`
 133
 134 The `simgrid::kernel::Future` and `simgrid::kernel::Promise` use a
 135 shared state defined as follows:
 136
 137 @code{cpp}
 138 enum class FutureStatus {
 139   not_ready,
 140   ready,
 141   done,
 142 };
 143
 144 class FutureStateBase : private boost::noncopyable {
 145 public:
 146   void schedule(simgrid::xbt::Task<void()>&& job);
 147   void set_exception(std::exception_ptr exception);
 148   void set_continuation(simgrid::xbt::Task<void()>&& continuation);
 149   FutureStatus get_status() const;
 150   bool is_ready() const;
 151   // [...]
 152 private:
 153   FutureStatus status_ = FutureStatus::not_ready;
 154   std::exception_ptr exception_;
 155   simgrid::xbt::Task<void()> continuation_;
 156 };
 157
 158 template<class T>
 159 class FutureState : public FutureStateBase {
 160 public:
 161   void set_value(T value);
 162   T get();
 163 private:
 164   boost::optional<T> value_;
 165 };
 166
 167 template<class T>
 168 class FutureState<T&> : public FutureStateBase {
 169   // ...
 170 };
 171 template<>
 172 class FutureState<void> : public FutureStateBase {
 173   // ...
 174 };
 175 @endcode
 176
 177 Both `Future` and `Promise` have a reference to the shared state:
 178
 179 @code{cpp}
 180 template<class T>
 181 class Future {
 182   // [...]
 183 private:
 184   std::shared_ptr<FutureState<T>> state_;
 185 };
 186
 187 template<class T>
 188 class Promise {
 189   // [...]
 190 private:
 191   std::shared_ptr<FutureState<T>> state_;
 192   bool future_get_ = false;
 193 };
 194 @endcode
 195
 196 The crux of `future.then()` is:
 197
 198 @code{cpp}
 199 template<class T>
 200 template<class F>
 201 auto simgrid::kernel::Future<T>::thenNoUnwrap(F continuation)
 202 -> Future<decltype(continuation(std::move(*this)))>
 203 {
 204   typedef decltype(continuation(std::move(*this))) R;
 205
 206   if (state_ == nullptr)
 207     throw std::future_error(std::future_errc::no_state);
 208
 209   auto state = std::move(state_);
 210   // Create a new future...
 211   Promise<R> promise;
 212   Future<R> future = promise.get_future();
 213   // ...and when the current future is ready...
 214   state->set_continuation(simgrid::xbt::makeTask(
 215     [](Promise<R> promise, std::shared_ptr<FutureState<T>> state,
 216          F continuation) {
 217       // ...set the new future value by running the continuation.
 218       Future<T> future(std::move(state));
 219       simgrid::xbt::fulfillPromise(promise,[&]{
 220         return continuation(std::move(future));
 221       });
 222     },
 223     std::move(promise), state, std::move(continuation)));
 224   return std::move(future);
 225 }
 226 @endcode
 227
 228 We added a (much simpler) `future.then_()` method which does not
 229 create a new future:
 230
 231 @code{cpp}
 232 template<class T>
 233 template<class F>
 234 void simgrid::kernel::Future<T>::then_(F continuation)
 235 {
 236   if (state_ == nullptr)
 237     throw std::future_error(std::future_errc::no_state);
 238   // Give shared-ownership to the continuation:
 239   auto state = std::move(state_);
 240   state->set_continuation(simgrid::xbt::makeTask(
 241     std::move(continuation), state));
 242 }
 243 @endcode
 244
 245 The `.get()` delegates to the shared state. As we mentioned previously, an
 246 error is raised if the future is not ready:
 247
 248 @code{cpp}
 249 template<class T>
 250 T simgrid::kernel::Future::get()
 251 {
 252   if (state_ == nullptr)
 253     throw std::future_error(std::future_errc::no_state);
 254   std::shared_ptr<FutureState<T>> state = std::move(state_);
 255   return state->get();
 256 }
 257
 258 template<class T>
 259 T simgrid::kernel::FutureState<T>::get()
 260 {
 261   if (status_ != FutureStatus::ready)
 262     xbt_die("Deadlock: this future is not ready");
 263   status_ = FutureStatus::done;
 264   if (exception_) {
 265     std::exception_ptr exception = std::move(exception_);
 266     std::rethrow_exception(std::move(exception));
 267   }
 268   xbt_assert(this->value_);
 269   auto result = std::move(this->value_.get());
 270   this->value_ = boost::optional<T>();
 271   return std::move(result);
 272 }
 273 @endcode
 274
 275 ## Generic simcalls
 276
 277 ### Motivation
 278
 279 Simcalls are not so easy to understand and adding a new one is not so easy
 280 either. In order to add one simcall, one has to first
 281 add it to the [list of simcalls](https://github.com/simgrid/simgrid/blob/4ae2fd01d8cc55bf83654e29f294335e3cb1f022/src/simix/simcalls.in)
 282 which looks like this:
 283
 284 @code{cpp}
 285 # This looks like C++ but it is a basic IDL-like language
 286 # (one definition per line) parsed by a python script:
 287
 288 void process_kill(smx_process_t process);
 289 void process_killall(int reset_pid);
 290 void process_cleanup(smx_process_t process) [[nohandler]];
 291 void process_suspend(smx_process_t process) [[block]];
 292 void process_resume(smx_process_t process);
 293 void process_set_host(smx_process_t process, sg_host_t dest);
 294 int  process_is_suspended(smx_process_t process) [[nohandler]];
 295 int  process_join(smx_process_t process, double timeout) [[block]];
 296 int  process_sleep(double duration) [[block]];
 297
 298 smx_mutex_t mutex_init();
 299 void        mutex_lock(smx_mutex_t mutex) [[block]];
 300 int         mutex_trylock(smx_mutex_t mutex);
 301 void        mutex_unlock(smx_mutex_t mutex);
 302
 303 [...]
 304 @endcode
 305
 306 At runtime, a simcall is represented by a structure containing a simcall
 307 number and its arguments (among some other things):
 308
 309 @code{cpp}
 310 struct s_smx_simcall {
 311   // Simcall number:
 312   e_smx_simcall_t call;
 313   // Issuing actor:
 314   smx_process_t issuer;
 315   // Arguments of the simcall:
 316   union u_smx_scalar args[11];
 317   // Result of the simcall:
 318   union u_smx_scalar result;
 319   // Some additional stuff:
 320   smx_timer_t timer;
 321   int mc_value;
 322 };
 323 @endcode
 324
 325 with the a scalar union type:
 326
 327 @code{cpp}
 328 union u_smx_scalar {
 329   char            c;
 330   short           s;
 331   int             i;
 332   long            l;
 333   long long       ll;
 334   unsigned char   uc;
 335   unsigned short  us;
 336   unsigned int    ui;
 337   unsigned long   ul;
 338   unsigned long long ull;
 339   double          d;
 340   void*           dp;
 341   FPtr            fp;
 342 };
 343 @endcode
 344
 345 Then one has to call (manually:cry:) a
 346 [Python script](https://github.com/simgrid/simgrid/blob/4ae2fd01d8cc55bf83654e29f294335e3cb1f022/src/simix/simcalls.py)
 347 which generates a bunch of C++ files:
 348
 349 * an enum of all the [simcall numbers](https://github.com/simgrid/simgrid/blob/4ae2fd01d8cc55bf83654e29f294335e3cb1f022/src/simix/popping_enum.h#L19);
 350
 351 * [user-side wrappers](https://github.com/simgrid/simgrid/blob/4ae2fd01d8cc55bf83654e29f294335e3cb1f022/src/simix/popping_bodies.cpp)
 352   responsible for wrapping the parameters in the `struct s_smx_simcall`;
 353   and wrapping out the result;
 354
 355 * [accessors](https://github.com/simgrid/simgrid/blob/4ae2fd01d8cc55bf83654e29f294335e3cb1f022/src/simix/popping_accessors.h)
 356    to get/set values of of `struct s_smx_simcall`;
 357
 358 * a simulation-kernel-side [big switch](https://github.com/simgrid/simgrid/blob/4ae2fd01d8cc55bf83654e29f294335e3cb1f022/src/simix/popping_generated.cpp#L106)
 359   handling all the simcall numbers.
 360
 361 Then one has to write the code of the kernel side handler for the simcall
 362 and the code of the simcall itself (which calls the code-generated
 363 marshaling/unmarshaling stuff):sob:.
 364
 365 In order to simplify this process, we added two generic simcalls which can be
 366 used to execute a function in the simulation kernel:
 367
 368 @code{cpp}
 369 # This one should really be called run_immediate:
 370 void run_kernel(std::function<void()> const* code) [[nohandler]];
 371 void run_blocking(std::function<void()> const* code) [[block,nohandler]];
 372 @endcode
 373
 374 ### Immediate simcall
 375
 376 The first one (`simcall_run_kernel()`) executes a function in the simulation
 377 kernel context and returns immediately (without blocking the actor):
 378
 379 @code{cpp}
 380 void simcall_run_kernel(std::function<void()> const& code)
 381 {
 382   simcall_BODY_run_kernel(&code);
 383 }
 384
 385 template<class F> inline
 386 void simcall_run_kernel(F& f)
 387 {
 388   simcall_run_kernel(std::function<void()>(std::ref(f)));
 389 }
 390 @endcode
 391
 392 On top of this, we add a wrapper which can be used to return a value of any
 393 type and properly handles exceptions:
 394
 395 @code{cpp}
 396 template<class F>
 397 typename std::result_of<F()>::type kernelImmediate(F&& code)
 398 {
 399   // If we are in the simulation kernel, we take the fast path and
 400   // execute the code directly without simcall
 401   // marshalling/unmarshalling/dispatch:
 402   if (SIMIX_is_maestro())
 403     return std::forward<F>(code)();
 404
 405   // If we are in the application, pass the code to the simulation
 406   // kernel which executes it for us and reports the result:
 407   typedef typename std::result_of<F()>::type R;
 408   simgrid::xbt::Result<R> result;
 409   simcall_run_kernel([&]{
 410     xbt_assert(SIMIX_is_maestro(), "Not in maestro");
 411     simgrid::xbt::fulfillPromise(result, std::forward<F>(code));
 412   });
 413   return result.get();
 414 }
 415 @endcode
 416
 417 where [`Result<R>`](#result) can store either a `R` or an exception.
 418
 419 Example of usage:
 420
 421 @code{cpp}
 422 xbt_dict_t Host::properties() {
 423   return simgrid::simix::kernelImmediate([&] {
 424     simgrid::surf::HostImpl* surf_host =
 425       this->extension<simgrid::surf::HostImpl>();
 426     return surf_host->getProperties();
 427   });
 428 }
 429 @endcode
 430
 431 ### Blocking simcall
 432
 433 The second generic simcall (`simcall_run_blocking()`) executes a function in
 434 the SimGrid simulation kernel immediately but does not wake up the calling actor
 435 immediately:
 436
 437 @code{cpp}
 438 void simcall_run_blocking(std::function<void()> const& code);
 439
 440 template<class F>
 441 void simcall_run_blocking(F& f)
 442 {
 443   simcall_run_blocking(std::function<void()>(std::ref(f)));
 444 }
 445 @endcode
 446
 447 The `f` function is expected to setup some callbacks in the simulation
 448 kernel which will wake up the actor (with
 449 `simgrid::simix::unblock(actor)`) when the operation is completed.
 450
 451 This is wrapped in a higher-level primitive as well. The
 452 `kernelSync()` function expects a function-object which is executed
 453 immediately in the simulation kernel and returns a `Future<T>`.  The
 454 simulator blocks the actor and resumes it when the `Future<T>` becomes
 455 ready with its result:
 456
 457 @code{cpp}
 458 template<class F>
 459 auto kernelSync(F code) -> decltype(code().get())
 460 {
 461   typedef decltype(code().get()) T;
 462   if (SIMIX_is_maestro())
 463     xbt_die("Can't execute blocking call in kernel mode");
 464
 465   smx_process_t self = SIMIX_process_self();
 466   simgrid::xbt::Result<T> result;
 467
 468   simcall_run_blocking([&result, self, &code]{
 469     try {
 470       auto future = code();
 471       future.then_([&result, self](simgrid::kernel::Future<T> value) {
 472         // Propagate the result from the future
 473         // to the simgrid::xbt::Result:
 474         simgrid::xbt::setPromise(result, value);
 475         simgrid::simix::unblock(self);
 476       });
 477     }
 478     catch (...) {
 479       // The code failed immediately. We can wake up the actor
 480       // immediately with the exception:
 481       result.set_exception(std::current_exception());
 482       simgrid::simix::unblock(self);
 483     }
 484   });
 485
 486   // Get the result of the operation (which might be an exception):
 487   return result.get();
 488 }
 489 @endcode
 490
 491 A contrived example of this would be:
 492
 493 @code{cpp}
 494 int res = simgrid::simix::kernelSync([&] {
 495   return kernel_wait_until(30).then(
 496     [](simgrid::kernel::Future<void> future) {
 497       return 42;
 498     }
 499   );
 500 });
 501 @endcode
 502
 503 ### Asynchronous operations
 504
 505 We can write the related `kernelAsync()` which wakes up the actor immediately
 506 and returns a future to the actor. As this future is used in the actor context,
 507 it is a different future
 508 (`simgrid::simix::Future` instead of `simgrid::kernel::Furuere`)
 509 which implements a C++11 `std::future` wait-based API:
 510
 511 @code{cpp}
 512 template <class T>
 513 class Future {
 514 public:
 515   Future() {}
 516   Future(simgrid::kernel::Future<T> future) : future_(std::move(future)) {}
 517   bool valid() const { return future_.valid(); }
 518   T get();
 519   bool is_ready() const;
 520   void wait();
 521 private:
 522   // We wrap an event-based kernel future:
 523   simgrid::kernel::Future<T> future_;
 524 };
 525 @endcode
 526
 527 The `future.get()` method is implemented as[^getcompared]:
 528
 529 @code{cpp}
 530 template<class T>
 531 T simgrid::simix::Future<T>::get()
 532 {
 533   if (!valid())
 534     throw std::future_error(std::future_errc::no_state);
 535   smx_process_t self = SIMIX_process_self();
 536   simgrid::xbt::Result<T> result;
 537   simcall_run_blocking([this, &result, self]{
 538     try {
 539       // When the kernel future is ready...
 540       this->future_.then_(
 541         [this, &result, self](simgrid::kernel::Future<T> value) {
 542           // ... wake up the process with the result of the kernel future.
 543           simgrid::xbt::setPromise(result, value);
 544           simgrid::simix::unblock(self);
 545       });
 546     }
 547     catch (...) {
 548       result.set_exception(std::current_exception());
 549       simgrid::simix::unblock(self);
 550     }
 551   });
 552   return result.get();
 553 }
 554 @endcode
 555
 556 `kernelAsync()` simply :wink: calls `kernelImmediate()` and wraps the
 557 `simgrid::kernel::Future` into a `simgrid::simix::Future`:
 558
 559 @code{cpp}
 560 template<class F>
 561 auto kernelAsync(F code)
 562   -> Future<decltype(code().get())>
 563 {
 564   typedef decltype(code().get()) T;
 565
 566   // Execute the code in the simulation kernel and get the kernel future:
 567   simgrid::kernel::Future<T> future =
 568     simgrid::simix::kernelImmediate(std::move(code));
 569
 570   // Wrap the kernel future in a user future:
 571   return simgrid::simix::Future<T>(std::move(future));
 572 }
 573 @endcode
 574
 575 A contrived example of this would be:
 576
 577 @code{cpp}
 578 simgrid::simix::Future<int> future = simgrid::simix::kernelSync([&] {
 579   return kernel_wait_until(30).then(
 580     [](simgrid::kernel::Future<void> future) {
 581       return 42;
 582     }
 583   );
 584 });
 585 do_some_stuff();
 586 int res = future.get();
 587 @endcode
 588
 589 `kernelSync()` could be rewritten as:
 590
 591 @code{cpp}
 592 template<class F>
 593 auto kernelSync(F code) -> decltype(code().get())
 594 {
 595   return kernelAsync(std::move(code)).get();
 596 }
 597 @endcode
 598
 599 The semantic is equivalent but this form would require two simcalls
 600 instead of one to do the same job (one in `kernelAsync()` and one in
 601 `.get()`).
 602
 603 ## Representing the simulated time
 604
 605 SimGrid uses `double` for representing the simulated time:
 606
 607 * durations are expressed in seconds;
 608
 609 * timepoints are expressed as seconds from the beginning of the simulation.
 610
 611 In contrast, all the C++ APIs use `std::chrono::duration` and
 612 `std::chrono::time_point`. They are used in:
 613
 614 * `std::this_thread::wait_for()` and `std::this_thread::wait_until()`;
 615
 616 * `future.wait_for()` and `future.wait_until()`;
 617
 618 * `condvar.wait_for()` and `condvar.wait_until()`.
 619
 620 We can define `future.wait_for(duration)` and `future.wait_until(timepoint)`
 621 for our futures but for better compatibility with standard C++ code, we might
 622 want to define versions expecting `std::chrono::duration` and
 623 `std::chrono::time_point`.
 624
 625 For time points, we need to define a clock (which meets the
 626 [TrivialClock](http://en.cppreference.com/w/cpp/concept/TrivialClock)
 627 requirements, see
 628 [`[time.clock.req]`](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf#page=642)
 629 working in the simulated time in the C++14 standard):
 630
 631 @code{cpp}
 632 struct SimulationClock {
 633   using rep        = double;
 634   using period     = std::ratio<1>;
 635   using duration   = std::chrono::duration<rep, period>;
 636   using time_point = std::chrono::time_point<SimulationClock, duration>;
 637   static constexpr bool is_steady = true;
 638   static time_point now()
 639   {
 640     return time_point(duration(SIMIX_get_clock()));
 641   }
 642 };
 643 @endcode
 644
 645 A time point in the simulation is a time point using this clock:
 646
 647 @code{cpp}
 648 template<class Duration>
 649 using SimulationTimePoint =
 650   std::chrono::time_point<SimulationClock, Duration>;
 651 @endcode
 652
 653 This is used for example in `simgrid::s4u::this_actor::sleep_for()` and
 654 `simgrid::s4u::this_actor::sleep_until()`:
 655
 656 @code{cpp}
 657 void sleep_for(double duration)
 658 {
 659   if (duration > 0)
 660     simcall_process_sleep(duration);
 661 }
 662
 663 void sleep_until(double timeout)
 664 {
 665   double now = SIMIX_get_clock();
 666   if (timeout > now)
 667     simcall_process_sleep(timeout - now);
 668 }
 669
 670 template<class Rep, class Period>
 671 void sleep_for(std::chrono::duration<Rep, Period> duration)
 672 {
 673   auto seconds =
 674     std::chrono::duration_cast<SimulationClockDuration>(duration);
 675   this_actor::sleep_for(seconds.count());
 676 }
 677
 678 template<class Duration>
 679 void sleep_until(const SimulationTimePoint<Duration>& timeout_time)
 680 {
 681   auto timeout_native =
 682     std::chrono::time_point_cast<SimulationClockDuration>(timeout_time);
 683   this_actor::sleep_until(timeout_native.time_since_epoch().count());
 684 }
 685 @endcode
 686
 687 Which means it is possible to use (since C++14):
 688
 689 @code{cpp}
 690 using namespace std::chrono_literals;
 691 simgrid::s4u::actor::sleep_for(42s);
 692 @endcode
 693
 694 ## Mutexes and condition variables
 695
 696 ## Mutexes
 697
 698 SimGrid has had a C-based API for mutexes and condition variables for
 699 some time.  These mutexes are different from the standard
 700 system-level mutex (`std::mutex`, `pthread_mutex_t`, etc.) because
 701 they work at simulation-level.  Locking on a simulation mutex does
 702 not block the thread directly but makes a simcall
 703 (`simcall_mutex_lock()`) which asks the simulation kernel to wake the calling
 704 actor when it can get ownership of the mutex. Blocking directly at the
 705 OS level would deadlock the simulation.
 706
 707 Reusing the C++ standard API for our simulation mutexes has many
 708 benefits:
 709
 710  * it makes it easier for people familiar with the `std::mutex` to
 711    understand and use SimGrid mutexes;
 712
 713  * we can benefit from a proven API;
 714
 715  * we can reuse from generic library code in SimGrid.
 716
 717 We defined a reference-counted `Mutex` class for this (which supports
 718 the [`Lockable`](http://en.cppreference.com/w/cpp/concept/Lockable)
 719 requirements, see
 720 [`[thread.req.lockable.req]`](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf#page=1175)
 721 in the C++14 standard):
 722
 723 @code{cpp}
 724 class Mutex {
 725   friend ConditionVariable;
 726 private:
 727   friend simgrid::simix::Mutex;
 728   simgrid::simix::Mutex* mutex_;
 729   Mutex(simgrid::simix::Mutex* mutex) : mutex_(mutex) {}
 730 public:
 731
 732   friend void intrusive_ptr_add_ref(Mutex* mutex);
 733   friend void intrusive_ptr_release(Mutex* mutex);
 734   using Ptr = boost::intrusive_ptr<Mutex>;
 735
 736   // No copy:
 737   Mutex(Mutex const&) = delete;
 738   Mutex& operator=(Mutex const&) = delete;
 739
 740   static Ptr createMutex();
 741
 742 public:
 743   void lock();
 744   void unlock();
 745   bool try_lock();
 746 };
 747 @endcode
 748
 749 The methods are simply wrappers around existing simcalls:
 750
 751 @code{cpp}
 752 void Mutex::lock()
 753 {
 754   simcall_mutex_lock(mutex_);
 755 }
 756 @endcode
 757
 758 Using the same API as `std::mutex` (`Lockable`) means we can use existing
 759 C++-standard code such as `std::unique_lock<Mutex>` or
 760 `std::lock_guard<Mutex>` for exception-safe mutex handling[^lock]:
 761
 762 @code{cpp}
 763 {
 764   std::lock_guard<simgrid::s4u::Mutex> lock(*mutex);
 765   sum += 1;
 766 }
 767 @endcode
 768
 769 ### Condition Variables
 770
 771 Similarly SimGrid already had simulation-level condition variables
 772 which can be exposed using the same API as `std::condition_variable`:
 773
 774 @code{cpp}
 775 class ConditionVariable {
 776 private:
 777   friend s_smx_cond;
 778   smx_cond_t cond_;
 779   ConditionVariable(smx_cond_t cond) : cond_(cond) {}
 780 public:
 781
 782   ConditionVariable(ConditionVariable const&) = delete;
 783   ConditionVariable& operator=(ConditionVariable const&) = delete;
 784
 785   friend void intrusive_ptr_add_ref(ConditionVariable* cond);
 786   friend void intrusive_ptr_release(ConditionVariable* cond);
 787   using Ptr = boost::intrusive_ptr<ConditionVariable>;
 788   static Ptr createConditionVariable();
 789
 790   void wait(std::unique_lock<Mutex>& lock);
 791   template<class P>
 792   void wait(std::unique_lock<Mutex>& lock, P pred);
 793
 794   // Wait functions taking a plain double as time:
 795
 796   std::cv_status wait_until(std::unique_lock<Mutex>& lock,
 797     double timeout_time);
 798   std::cv_status wait_for(
 799     std::unique_lock<Mutex>& lock, double duration);
 800   template<class P>
 801   bool wait_until(std::unique_lock<Mutex>& lock,
 802     double timeout_time, P pred);
 803   template<class P>
 804   bool wait_for(std::unique_lock<Mutex>& lock,
 805     double duration, P pred);
 806
 807   // Wait functions taking a std::chrono time:
 808
 809   template<class Rep, class Period, class P>
 810   bool wait_for(std::unique_lock<Mutex>& lock,
 811     std::chrono::duration<Rep, Period> duration, P pred);
 812   template<class Rep, class Period>
 813   std::cv_status wait_for(std::unique_lock<Mutex>& lock,
 814     std::chrono::duration<Rep, Period> duration);
 815   template<class Duration>
 816   std::cv_status wait_until(std::unique_lock<Mutex>& lock,
 817     const SimulationTimePoint<Duration>& timeout_time);
 818   template<class Duration, class P>
 819   bool wait_until(std::unique_lock<Mutex>& lock,
 820     const SimulationTimePoint<Duration>& timeout_time, P pred);
 821
 822   // Notify:
 823
 824   void notify_one();
 825   void notify_all();
 826
 827 };
 828 @endcode
 829
 830 We currently accept both `double` (for simplicity and consistency with
 831 the current codebase) and `std::chrono` types (for compatibility with
 832 C++ code) as durations and timepoints. One important thing to notice here is
 833 that `cond.wait_for()` and `cond.wait_until()` work in the simulated time,
 834 not in the real time.
 835
 836 The simple `cond.wait()` and `cond.wait_for()` delegate to
 837 pre-existing simcalls:
 838
 839 @code{cpp}
 840 void ConditionVariable::wait(std::unique_lock<Mutex>& lock)
 841 {
 842   simcall_cond_wait(cond_, lock.mutex()->mutex_);
 843 }
 844
 845 std::cv_status ConditionVariable::wait_for(
 846   std::unique_lock<Mutex>& lock, double timeout)
 847 {
 848   // The simcall uses -1 for "any timeout" but we don't want this:
 849   if (timeout < 0)
 850     timeout = 0.0;
 851
 852   try {
 853     simcall_cond_wait_timeout(cond_, lock.mutex()->mutex_, timeout);
 854     return std::cv_status::no_timeout;
 855   }
 856   catch (xbt_ex& e) {
 857
 858     // If the exception was a timeout, we have to take the lock again:
 859     if (e.category == timeout_error) {
 860       try {
 861         lock.mutex()->lock();
 862         return std::cv_status::timeout;
 863       }
 864       catch (...) {
 865         std::terminate();
 866       }
 867     }
 868
 869     std::terminate();
 870   }
 871   catch (...) {
 872     std::terminate();
 873   }
 874 }
 875 @endcode
 876
 877 Other methods are simple wrappers around those two:
 878
 879 @code{cpp}
 880 template<class P>
 881 void ConditionVariable::wait(std::unique_lock<Mutex>& lock, P pred)
 882 {
 883   while (!pred())
 884     wait(lock);
 885 }
 886
 887 template<class P>
 888 bool ConditionVariable::wait_until(std::unique_lock<Mutex>& lock,
 889   double timeout_time, P pred)
 890 {
 891   while (!pred())
 892     if (this->wait_until(lock, timeout_time) == std::cv_status::timeout)
 893       return pred();
 894   return true;
 895 }
 896
 897 template<class P>
 898 bool ConditionVariable::wait_for(std::unique_lock<Mutex>& lock,
 899   double duration, P pred)
 900 {
 901   return this->wait_until(lock,
 902     SIMIX_get_clock() + duration, std::move(pred));
 903 }
 904 @endcode
 905
 906
 907 ## Conclusion
 908
 909 We wrote two future implementations based on the `std::future` API:
 910
 911 * the first one is a non-blocking event-based (`future.then(stuff)`)
 912   future used inside our (non-blocking event-based) simulation kernel;
 913
 914 * the second one is a wait-based (`future.get()`) future used in the actors
 915   which waits using a simcall.
 916
 917 These futures are used to implement `kernelSync()` and `kernelAsync()` which
 918 expose asynchronous operations in the simulation kernel to the actors.
 919
 920 In addition, we wrote variations of some other C++ standard library
 921 classes (`SimulationClock`, `Mutex`, `ConditionVariable`) which work in
 922 the simulation:
 923
 924   * using simulated time;
 925
 926   * using simcalls for synchronisation.
 927
 928 Reusing the same API as the C++ standard library is very useful because:
 929
 930   * we use a proven API with a clearly defined semantic;
 931
 932   * people already familiar with those API can use our own easily;
 933
 934   * users can rely on documentation, examples and tutorials made by other
 935     people;
 936
 937   * we can reuse generic code with our types (`std::unique_lock`,
 938    `std::lock_guard`, etc.).
 939
 940 This type of approach might be useful for other libraries which define
 941 their own contexts. An example of this is
 942 [Mordor](https://github.com/mozy/mordor), a I/O library using fibers
 943 (cooperative scheduling): it implements cooperative/fiber
 944 [mutex](https://github.com/mozy/mordor/blob/4803b6343aee531bfc3588ffc26a0d0fdf14b274/mordor/fibersynchronization.h#L70),
 945 [recursive
 946 mutex](https://github.com/mozy/mordor/blob/4803b6343aee531bfc3588ffc26a0d0fdf14b274/mordor/fibersynchronization.h#L105)
 947 which are compatible with the
 948 [`BasicLockable`](http://en.cppreference.com/w/cpp/concept/BasicLockable)
 949 requirements (see
 950 [`[thread.req.lockable.basic]`]((http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf#page=1175))
 951 in the C++14 standard).
 952
 953 ## Appendix: useful helpers
 954
 955 ### `Result`
 956
 957 Result is like a mix of `std::future` and `std::promise` in a
 958 single-object without shared-state and synchronisation:
 959
 960 @code{cpp}
 961 template<class T>
 962 class Result {
 963   enum class ResultStatus {
 964     invalid,
 965     value,
 966     exception,
 967   };
 968 public:
 969   Result();
 970   ~Result();
 971   Result(Result const& that);
 972   Result& operator=(Result const& that);
 973   Result(Result&& that);
 974   Result& operator=(Result&& that);
 975   bool is_valid() const;
 976   void reset();
 977   void set_exception(std::exception_ptr e);
 978   void set_value(T&& value);
 979   void set_value(T const& value);
 980   T get();
 981 private:
 982   ResultStatus status_ = ResultStatus::invalid;
 983   union {
 984     T value_;
 985     std::exception_ptr exception_;
 986   };
 987 };
 988 @endcode~
 989
 990 ### Promise helpers
 991
 992 Those helper are useful for dealing with generic future-based code:
 993
 994 @code{cpp}
 995 template<class R, class F>
 996 auto fulfillPromise(R& promise, F&& code)
 997 -> decltype(promise.set_value(code()))
 998 {
 999   try {
1000     promise.set_value(std::forward<F>(code)());
1001   }
1002   catch(...) {
1003     promise.set_exception(std::current_exception());
1004   }
1005 }
1006
1007 template<class P, class F>
1008 auto fulfillPromise(P& promise, F&& code)
1009 -> decltype(promise.set_value())
1010 {
1011   try {
1012     std::forward<F>(code)();
1013     promise.set_value();
1014   }
1015   catch(...) {
1016     promise.set_exception(std::current_exception());
1017   }
1018 }
1019
1020 template<class P, class F>
1021 void setPromise(P& promise, F&& future)
1022 {
1023   fulfillPromise(promise, [&]{ return std::forward<F>(future).get(); });
1024 }
1025 @endcode
1026
1027 ### Task
1028
1029 `Task<R(F...)>` is a type-erased callable object similar to
1030 `std::function<R(F...)>` but works for move-only types. It is similar to
1031 `std::package_task<R(F...)>` but does not wrap the result in a `std::future<R>`
1032 (it is not <i>packaged</i>).
1033
1034 |               |`std::function` |`std::packaged_task`|`simgrid::xbt::Task`
1035 |---------------|----------------|--------------------|--------------------------
1036 |Copyable       | Yes            | No                 | No
1037 |Movable        | Yes            | Yes                | Yes
1038 |Call           | `const`        | non-`const`        | non-`const`
1039 |Callable       | multiple times | once               | once
1040 |Sets a promise | No             | Yes                | No
1041
1042 It could be implemented as:
1043
1044 @code{cpp}
1045 template<class T>
1046 class Task {
1047 private:
1048   std::packaged_task<T> task_;
1049 public:
1050
1051   template<class F>
1052   void Task(F f) :
1053     task_(std::forward<F>(f))
1054   {}
1055
1056   template<class... ArgTypes>
1057   auto operator()(ArgTypes... args)
1058   -> decltype(task_.get_future().get())
1059   {
1060     task_(std::forward<ArgTypes)(args)...);
1061     return task_.get_future().get();
1062   }
1063
1064 };
1065 @endcode
1066
1067 but we don't need a shared-state.
1068
1069 This is useful in order to bind move-only type arguments:
1070
1071 @code{cpp}
1072 template<class F, class... Args>
1073 class TaskImpl {
1074 private:
1075   F code_;
1076   std::tuple<Args...> args_;
1077   typedef decltype(simgrid::xbt::apply(
1078     std::move(code_), std::move(args_))) result_type;
1079 public:
1080   TaskImpl(F code, std::tuple<Args...> args) :
1081     code_(std::move(code)),
1082     args_(std::move(args))
1083   {}
1084   result_type operator()()
1085   {
1086     // simgrid::xbt::apply is C++17 std::apply:
1087     return simgrid::xbt::apply(std::move(code_), std::move(args_));
1088   }
1089 };
1090
1091 template<class F, class... Args>
1092 auto makeTask(F code, Args... args)
1093 -> Task< decltype(code(std::move(args)...))() >
1094 {
1095   TaskImpl<F, Args...> task(
1096     std::move(code), std::make_tuple(std::move(args)...));
1097   return std::move(task);
1098 }
1099 @endcode
1100
1101
1102 ## Notes
1103
1104 [^getcompared]:
1105
1106     You might want to compare this method with `simgrid::kernel::Future::get()`
1107     we showed previously: the method of the kernel future does not block and
1108     raises an error if the future is not ready; the method of the actor future
1109     blocks after having set a continuation to wake the actor when the future
1110     is ready.
1111
1112 [^lock]:
1113
1114     `std::lock()` might kinda work too but it may not be such as good idea to
1115     use it as it may use a [<q>deadlock avoidance algorithm such as
1116     try-and-back-off</q>](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf#page=1199).
1117     A backoff would probably uselessly wait in real time instead of simulated
1118     time. The deadlock avoidance algorithm might as well add non-determinism
1119     in the simulation which we would like to avoid.
1120     `std::try_lock()` should be safe to use though.
1121
1122 */