Document more changes, and update the Release Notes

[simgrid.git] / docs / source / Release_Notes.rst
diff --git a/docs/source/Release_Notes.rst b/docs/source/Release_Notes.rst

index bc0f7d0..55ab2e5 100644 (file)
--- a/docs/source/Release_Notes.rst
+++ b/docs/source/Release_Notes.rst
@@ -19,7 +19,7 @@ The Half Release, a.k.a. the Zealous Easter Trim:
  
    * Introducing v4 of the XML platform format (many long-due cleanups)
    * MSG examples fully reorganized (in C and Java)
-  * The S4U interface is rising, toward SimGrid 4:|br| All host manipulations now done in S4U, 
+  * The S4U interface is rising, toward SimGrid 4:|br| All host manipulations now done in S4U,
      SimDag was mostly rewritten on top of S4U but MSG & SimDag interfaces mostly unchanged.
  
  Version 3.14 (Dec 25. 2016)
@@ -183,7 +183,7 @@ interfaces and converting them to snake_case() is one release goal of v3.20. But
  Once the S4U interface stabilizes, we will provide C bindings on top of it, along with Java and Python ones. Maybe in 3.21 or 3.22.
  
  All this is not contradictory with the fact that MSG as a whole is deprecated, because this deprecation only means that new projects
-should go for S4U instead of MSG to benefit of the future. Despite this deprecation, old MSG projects should still be usable with no 
+should go for S4U instead of MSG to benefit of the future. Despite this deprecation, old MSG projects should still be usable with no
  change, if we manage to. This is a matter of scientific reproducibility to us.
  
  Version 3.20 (June 25 2018)
@@ -314,7 +314,7 @@ This is the "Palindrome Day" release (today is 02 02 2020).
       The plan is to completely remove MSG by 2020Q4 or 2021Q1.
  
     * SimDAG++: Automatic dependencies on S4U activities (experimental). |br|
-     This implements some features of SimDAG within S4U, but not all of them: you cannot block an activity until it's scheduled on a resource 
+     This implements some features of SimDAG within S4U, but not all of them: you cannot block an activity until it's scheduled on a resource
       and there is no heterogeneous wait_any() that would mix Exec/Comm/Io activities. See ``examples/s4u/{io,exec,comm}-dependent`` for what's already there.
  
  Since last fall, we continued to push toward the future SimGrid4 release. This requires to remove MSG and SimDAG once all users have
@@ -461,45 +461,43 @@ new feature are:
     of years.
   * The removal of SimDag led us to also remove the export to Jedule files that was tightly coupled to SimDag. The instrumentation of DAG simulation
     is still possible through the regular instrumentation API based on the Paje format.
- 
-On the bindings front, we dropped the Lua bindings to create new platforms, as the C++ and Python interfaces are much better to that extend. 
-Also, the algorithm tutorial can now be taken in Python, for those of you allergic to C++.
  
-Finally, on the SMPI front, we introduced a new documentation section on calibrating the SMPI models from your measurements and fixed some issues
-with the replay mechanism.
+On the bindings front, we dropped the Lua bindings to create new platforms, as the C++ and Python interfaces are much better to that extend.
+Also, the algorithm tutorial can now be taken in Python, for those of you allergic to C++.
  
-Version 3.31 (not released yet)
--------------------------------
+Finally, on the SMPI front, we introduced a :ref:`new documentation section <models_calibration>` on calibrating the SMPI models from your
+measurements and fixed some issues with the replay mechanism.
  
-Expected: spring 2022
+Version 3.31 (March 22. 2022)
+-----------------------------
  
  **On the model checking front**, the long awaited big bang finally occurred, greatly simplifying future evolution.
  
  A formal verification with Mc SimGrid implies two processes: a verified application that is an almost regular SimGrid simulation, and a checker that
  is an external process guiding the verified application to ensure that it explores every possible execution scenario. When formal verification was
  initially introduced in SimGrid 15 years ago, both processes were intertwined in the same system process, but the mandated system tricks made it
-impossible to use gdb or valgrind on that Frankenstein process. Having more than one heap in your process is not usually supported.
+impossible to use gdb or valgrind on that Frankenstein process. Having two heaps in one process is not usually supported.
  
  The design was simplified in v3.12 (2015) by splitting the application and the checker in separate system processes. But both processes remained tightly
-coupled: when the checker needed some information (for example the mailbox implied in a send operation to compute whether this operation `commutes
+coupled: when the checker needed some information (such as the mailbox implied in a send operation, to compute whether this operation `commutes
  with another one <https://en.wikipedia.org/wiki/Partial_order_reduction>`_), the checker was directly reading the memory of the other system process.
  This was efficient and nice in C, but it prevented us from using C++ features such as opaque ``std::function`` data types. As such, it hindered the
  ongoing SimDAG++ code reorganization toward SimGrid4, where all activity classes should be homogeneously written in modern C++.
  
-This release introduces a new design, where the simcalls are given object-oriented Observers that can serialize the relevant information over the wire. 
-This information is used on the checker side to build Transition objects, representing the application simcalls. This explanation may not be crystal 
-clear, but the checker code is now much easier to work with as the formal logic is not spoiled with system-level tricks to retrieve the needed information.
-
-This cleaned design allowed us to finally implement the support for mutexes, semaphores and barriers in the model-checker. In particular, this should 
-enable the verification of RMA primitives with Mc SimGrid, as their implementation in SMPI is based on mutexes and barriers. Simix, a central element of 
-the SimGrid 3 design, was also finally removed: the last bits are deprecated and will be removed in 3.35. We also replaced the old, non-free ISP test suite 
-by the one from the `MPI Bug Initiative <https://hal.archives-ouvertes.fr/hal-03474762>`_, but not all tests are activated yet. This should eventually 
-help improving the robustness of Mc SimGrid.
+This release introduces a new design, where the simcalls are given object-oriented ``Observers`` that can serialize the relevant information over the wire.
+This information is used on the checker side to build ``Transition`` objects that the application simcalls. The checker code is now much simpler, as the
+formal logic is not spoiled with system-level tricks to retrieve the needed information.
  
-Future work on the model checker include: support for condition variables (that are still not handled), implementation of another exploration algorithm 
-based on event unfoldings (`The Anh Pham's thesis <https://tel.archives-ouvertes.fr/tel-02462074/document>`_ was defended in 2019), and the exploration 
-of scenarios where the actors get killed and communications timeout. Many things that were long dreamed of now become technically possible in this code base.
+This cleaned design allowed us to finally implement the support for mutexes, semaphores and barriers in the model-checker (condition variables are still
+missing). This enables in particular the verification of RMA primitives with Mc SimGrid, as their implementation in SMPI is based on mutexes and barriers.
+Simix, a central element of the SimGrid 3 design, was also finally removed: the last bits are deprecated and will be removed in 3.35. We also replaced the
+old, non-free ISP test suite by the one from the `MPI Bug Initiative <https://hal.archives-ouvertes.fr/hal-03474762>`_ (not all tests are activated yet).
+This will eventually help improving the robustness of Mc SimGrid.
  
+These changes unlock the future of Mc SimGrid. For the next releases, we plan to implement another exploration algorithm based on event unfoldings (using
+`The Anh Pham's thesis <https://tel.archives-ouvertes.fr/tel-02462074>`_), the exploration of scenarios where the actors get killed and/or where
+communications timeout, and the addition of a `wrapper to pthreads <https://hal.inria.fr/hal-02449080>`_, opening the path to the verification classical
+multithreaded applications.
  
  
  **On the model front,** we continued our quest for the modeling of parallel tasks (ptasks for short). Parallel tasks are intended to be an extension
@@ -508,7 +506,7 @@ computationnal kernel with computations and communications, or a video stream wi
  specify the amount of computation for each involved host, the amount of data to transfer between each host pair, and you're set. The model will
  identify bottleneck resources and fairly share them across activities within a ptask. From a user-level perspective, SimGrid handles ptasks just like
  every other activity except that the usual SimGrid models (LV08 or SMPI) rely on an optimized algorithm that cannot handle ptasks. You must
-:ref:`activate the L07 model <options_model_select>` on the command line. This "model" remains a sort of hack since its introduction 15 years ago, as
+activate :ref:`the L07 model <s4u_ex_ptasks>` on :ref:`the command line <options_model_select>`. This "model" remains a sort of hack since its introduction 15 years ago, as
  it has never been well defined. We never succeded to unify L07 and max-min based models: Fairness is still to be defined in this context that mixes
  flops and communicated bytes. The resulting activity rates are then specific to ptasks. Furthermore, unlike our network models, this model were not
  thoroughly validated with respect to real experiments before `the thesis of Adrien Faure <https://tel.archives-ouvertes.fr/tel-03155702>`_ (and the
@@ -518,9 +516,9 @@ of the microscopic dynamic model to a macroscopic equilibrium, but this converge
  no known algorithm to efficiently compute a BMF!
  
  L07 should still be avoided as we have exhibited simple scenarios where its solution is irrelevant to the BMF one (that is mathematically sound). This
-release thus introduces a new BMF model to finally unify both classical tasks and parallel tasks, but this is still ongoing work. The implemented
+release thus introduces a new BMF model to finally unify both classical and parallel tasks, but this is still ongoing work. The implemented
  heuristic works very well for most SimGrid tests, but we have found some (not so prevalent) corner cases where our code fails to solve the sharing
-problem in over 10 minutes... So all this should still be considered an ongoing research effort. We expect to have a better understanding of this issue
+problem in over 10 minutes... So this all should still be considered an ongoing research effort. We expect to have a better understanding of this issue
  by the next release.
  
  On a related topic, this release introduces :cpp:func:`simgrid::s4u::this_actor::thread_execute`, which allows creating a computation that comprises
@@ -528,6 +526,104 @@ several threads, and thus capable of utilizing more cores than a classical :cpp:
  it straightforward to model multithreaded computational kernels, and it comes with an illustrating example. It can be seen as a simplified ptask, but
  since it does not mix bytes and flops and has a homogeneous consumption over a single CPU, it perfectly fits with the classical SimGrid model.
  
+This release also introduces steadily progress **on the bindings front**, introducing in particular the Mutex, Barrier and Semaphore to your python scripts.
+
+Version 3.32 (October 3. 2022)
+------------------------------
+
+The Wiedervereinigung release. Germany was reunited 32 years ago.
+
+This release introduces tons of bugs fixes overall, and many small usability improvements contributed by the community.
+
+**On the bindings front**, we further completed the Python bindings: the whole C++ API of Comms is now accessible (and exemplified) in Python, while a
+few missing functions have been added to Engine and Mailboxes. It is also possible to manipulate ptasks from Python.
+
+The Python platform generation has also been improved. In particular, user's errors should now raise an exception instead of killing the interpreter.
+Various small improvements have been done to the graphicator tool so that you can now use jupyter to generate your platforms semi-interactively.
+
+**On the model checking front**, we did many refactoring operations behind the scene (the deprecated ``mc::api`` namespace was for example emptied and removed),
+but there are almost no user-level changes. The internal work is twofold.
+
+First, we'd like to make optional all the complexity that liveness properties require to explore the application state (dwarf, libunwind, mmalloc,
+etc) and instead only rely on fork to explore all the executions when liveness is not used. This would allow us to run the verified application under valgrind to
+ease its debugging. Some progress was made towards that goal, but we are still rather far from this goal.
+
+Second, we'd like to simplify the protocol between the model-checker and the application, to make it more robust and hopefully simplify the
+model-checker code. After release v3.31, the model-checker can properly observe the simcall of a given actor through the protocol instead of reading
+the application memory directly, but retrieving the list of actors still requires to read the remote memory, which in turn requires the aforementioned tricks on state
+introspection that we are trying to remove. This goal is much harder to achieve than it may sound in the current code base, but we
+note steady improvements in that direction.
+
+In addition to these refactoring, this version introduces ``sthread``, a tool to intercept pthread operations at run time. The goal is to use it
+together with the model-checker, but it's not working yet: we get a segfault during the initialization phase, and we failed to debug it so far. If
+only we could use valgrind on the verified application, this would probably be much easier.
+
+But we feel that it's probably better to not delay this release any further, as this tangled web will probably take time to get solved. So ``sthread``
+is included in the source even if it's not usable in MC mode yet.
+
+**On the interface front**, small API fixes and improvements have been done in S4U (in particular about virtual machines), while the support for MPI
+IO has been improved in SMPI. We also hope that ``sthread`` will help simulating OpenMP applications at some point, but it's not usable for that either.
+Hopefully in the next release.
+
+Finally, this release mostly entails maintenance work **on the model front**: a bug was fixed when using ptasks on multicore hosts, and the legacy
+stochastic generator of external load has been reintroduced.
+
+Version 3.33 (not released yet)
+-------------------------------
+
+**On the maintainance front,** we removed the ancient MSG interface which end-of-life was scheduled for 2020, the Java bindings
+that was MSG-only, support for native builds on Windows (WSL is now required) and support for 32 bits platforms. Keeping SimGrid
+alive while adding new features require to remove old, unused stuff. The very rare users impacted by these removals are urged to
+move to the new API and systems.
+
+We also conducted many internal refactorings to remove any occurence of "surf" and "simix". SimGrid v3.12 used a layered design
+where simix was providing synchronizations to actors, on top of surf which was computing the models. These features are now
+provided in modules, not layers. Surf became the kernel::{lmm, resource, routing, timer, xml} modules while simix became
+the kernel::{activity, actor, context} modules.
+
+**On the model front,** we realized an idea that has been on the back of our minds for quite some time. The question
+was: could we use something in the line of the ptask model, that mixes computations and network transfers in a single
+fluid activity, to simulate a *fluid I/O stream activity* that would consume both disk and network resources? This
+remained an open question for years, mainly because the implementation of the ptask does not rely on the LMM solver as
+the other models do. The *fair bottleneck* solver is convenient, but with less solid theoretical bases and the
+development of its replacement (the *bmf solver*) is still ongoing. However, this combination of I/Os and
+communications seemed easier as these activities share the same unit (bytes).
+
+After a few tentatives, we opted for a simple, slightly unperfect, yet convenient way to implement such I/O streams
+at the kernel level. It doesn't require a new model, just that the default HostModels implements a new function which
+creates a classical NetworkAction, but add some I/O-related constraints to it. A couple little hacks here and there,
+and done! A single activity mixing I/Os and communications can be created whose progress is limited by the resource
+(Disk or Link) of least bandwidth value.
+
+We also modified the Wi-Fi model so that the total capacity of a link depends on the amout of flows on that link, accordingly to
+the result of some ns-3 experiments. This model can be more accurate for congestioned Wi-Fi links, but its calibration is more
+demanding, as shown in the `example
+<https://framagit.org/simgrid/simgrid/tree/master/teshsuite/models/wifi_usage_decay/wifi_usage_decay.cpp>`_ and in the `research
+paper <https://hal.archives-ouvertes.fr/hal-03777726>`_.
+
+We also worked on the usability of our models, by actually writing the long overdue documentation of our TCP models and by renaming
+some options for clarity (old names are still accepted as aliases). A new function ``s4u::Engine::flatify_platform()`` dumps an
+XML representation that is inefficient (all zones are flatified) but easier to read (routes are explicitely defined). You should
+not use the output as a regular input file, but it will prove useful to double-check the your platform.
+
+**On the interface front**, the new ``Io::streamto()`` function has been inspired by the existing ``Comm::sendto()``
+function (which also derives from the ptask model). The user can specify a ``src_disk`` on a ``src_host`` and a
+``dst_disk`` on a ``dst_host`` to stream data of a given ``size``. Note that disks are optional, allowing users to
+simulate some kind of "disk-to-memory" or "memory-to-disk" I/O streams.
+
+As usual on that front, some functions were deprecated and will be removed in 4 versions, while some old deprecated functions
+were removed in this version.
+
+**On the model checking front**, we are almost done with the ongoing refactoring to ensure that the model-checker don't read
+directly the memory of the application beside checkpoint/restore and state equality. Instead, the network protocol is used to
+retrieve the information, which makes the code much easier to read and understand. We fixed a bug in the DPOR reduction which
+resulted in some failures to be missed by the exploration. We started implementing the UDPOR (Unfoldings DPOR) reduction
+algorithm, and it will certainly be part of the next release.
+
+We also extended the sthread module, which allows to intercept simple code that use pthread mutex and semaphores to simulate and
+verify it. You do not even need to recompile your code, as it uses LD_PRELOAD to intercept on the target functions. This module
+is still rather young, but it could already reveal useful to verify the code written by students in a class on UNIX IPC and
+synchronization. Check `the examples <https://framagit.org/simgrid/simgrid/tree/master/examples/sthread>`_.
  
  .. |br| raw:: html