ptask_BMF: High-level documentation of BMF and the algorithm

[simgrid.git] / ChangeLog
diff --git a/ChangeLog b/ChangeLog

index 13e0736..ebf5686 100644 (file)
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,18 +1,183 @@
+SimGrid (3.30.1) NOT RELEASED YET (v3.31 expected March 20. 2022, 15:33 UTC)
+
+MC:
+ - Rework the internals, for simpler and modern code. This shall unlock many future improvements.
+ - You can now define plugins onto SafetyChecker (a simple DFS explorer), using the declared signals.
+   See CommunicationDeterminism for an example.
+ - Support mutex, semaphore and barrier in DPOR reduction
+ - Seems to work on Arm64 architectures too.
+ - Display a nice error message when ptrace is not usable.
+ - New test suite, imported from the MPI Bugs Initiative (MBI). Not all MBI generators are integrated yet.
+ - Remove the ISP test suite: it's not free software, and it's superseeded by MBI.
+
+SMPI:
+ - fix for FG#100 by ensuring small asynchronous messages never overtake larger
+   ones, conforming to the standard.
+ - replay: fix waitall behaviour to avoid forgetting requests and leaking
+   their handles.
+ - tracing: ensure that we dump the TI traces continuously during execution and
+   not just at the end, reducing memory cost and performance hit.
+ - Update OpenMPI collectives selection logic to match current one (4.1.2)
+
+S4U:
+ - New signal: Engine::on_simulation_start_cb()
+ - Reimplementation of barriers natively. 
+   Previously, they were implemented on top of s4u::Mutex and s4u::ConditionVariable. 
+   The new version should be faster (and can be used in the model-checker).
+
+MSG:
+ - MSG_barrier_destroy now expects a non-const msg_barrier parameter.
+
+New plugin: the Chaos Monkey (killing actors at any time)
+ - Along with the new simgrid-monkey script, it tests whether your simulation 
+   resists resource failures at any possible timestamp in your simulation. 
+ - It is mostly intended to test the simgrid core in extreme conditions, 
+   but users may find it interesting too.
+
+Models:
+ - New model for parallel task: ptask_BMF.
+   - More realistic sharing of heterogeneous resources compared to ptask_L07.
+   - Implement the BMF (Bottleneck max fairness) fairness.
+   - Improved resource sharing for parallel tasks with sub-flows (parallel
+   communications between same source and destination inside the ptask).
+   - Parameters:
+     - "--cfg=host/model:ptask_BMF": enable the model.
+     - "--cfg=bmf/max-iterations: <N>" - maximum number of iterations performed
+        by BMF solver (default: 1000).
+        - "--cfg=bmf/selective-update:<true/false>" - enable/disable the
+        selective-update optimization. Only invalidates and recomputes modified
+        parts of inequations system. May speed up simulation if sparse resource
+        utilization (default: false).
+
+XBT:
+ - Drop xbt_dynar_shrink().
+
+Python:
+ - Added the following bindings: Comm.wait_for() and Comm.wait_any_for()
+   Example: examples/python/comm-waitfor/
+
+Fixed bugs (FG#.. -> FramaGit bugs; FG!.. -> FG merge requests)
+ (FG: issues on Framagit; GH: issues on GitHub)
+ - FG#57: Mc SimGrid should test whether ptrace is usable
+ - FG#87: Smpi scripts fail with spaces in paths
+ - FG#100: [SMPI] Order of the message matching is not guaranteed
+ - FG#101: LGPL 2.1 is deprecated license
+ - GH#151: Missing mutexes for DPOR.
+
  ----------------------------------------------------------------------------
  
-SimGrid (3.28.1) NOT RELEASED YET (v3.29 expected September 22. 2021, 19:21 UTC)
+SimGrid (3.30) January 30. 2022.
+
+The Sunday Bloody Sunday release.
+
+Main user-visible changes:
+ - The SimDag API for the simulation of the scheduling of Directed Acyclic
+   Graphs has been dropped. It was marked as deprecated for a couple of years.
+   We finally complete the implementation of what has been called SimDag++
+   internally, i.e., porting the different features of SimDag on top of S4U.
+   The new way to simulate the execution of dependent activities directly by
+   maestro (without any other actor) is details in the examples/cpp/dag-* series
+   of examples.
+ - The removal of SimDag led us to also remove the export to Jedule files that
+   was tightly coupled to SimDag. The instrumentation of DAG simulation is still
+   possible through the regular instrumentation API based on the Paje format.
+ - We also dropped the old and clumsy Lua bindings to create platforms in a
+   programmatic way. It can be done in C++ in a much cleaner way now, which
+   motivates this suppression.
+
+S4U:
+ - Introduce on_X_cb() functions for all signals, to attach a new
+   callback to the signal X. The signal variables are now hidden and
+   only these functions should be used.
+   Rationale: this enables the usual deprecation scheme where functions
+   remain for 4 releases if we need to modify the signals, while the
+   current code with the signal variables directly visible prevents any
+   smooth transition.
+ - New function: Engine::run_until(date), to split the simulation.
+ - New signal: Activity::on_veto, to detect when an activity fails to start.
+ - Signal change: Comm::on_start(Comm&, bool) has been replaced by
+   Comm::on_send and Comm::recv. These two signals respectively correspond to
+   when the sending or receiving side of a Comm is ready. They are raised at
+   the same locations as the former Comm::on_start signal.
+ - New function: Engine::track_vetoed_activities() to interrupt run()
+   when an activity fails to start, and to keep track of such activities.
+   Please see the corresponding example for more info.
+ - New functions: s4u::Comm::{sendto_init, set_source, set_destination} to enable
+   the use of vetoers with direct host-to-host communications. Both source and
+   destination have to set for a comm to start. Each call to these setters check
+   if all vetoes are satisfied. When it is the case, the comm starts. A use case of
+   these functions is given in examples/cpp/dag-scheduling.
+ - New functions: {Exec, Io}::update_priority allow you to modify the priority of
+   these kinds of activities during their execution. Behavior is detailed in
+   examples/cpp/io-priority/.
+
+SMPI:
+ - Dynamic costs for MPI operations: New API to allow users to dynamically
+   change injected costs for MPI_Recv, MPI_Send and MPI_Isend operations.
+   Alternative for smpi/or, smpi/os and smpi/ois configuration options.
+ - Fix some issues with the replay mechanism.
+
+XBT:
+ - Function xbt::Extendable::get_data() is now templated with the type of the
+   pointee. Untyped function is deprecated. Use get_data<void>() if you still
+   want to retrieve void*.
+
+Documentation:
+ - New section: "SimGrid MPI calibration of a Grid5000 cluster"
+   presenting how to properly calibrate MPI communications in SimGrid.
+ - Complete and reword the platform section, which is now completed.
+
+Python:
+ - Thread contexts are used by default with Python bindings.  Other kinds of
+   contexts revealed unstable, specially starting with pybind11 v2.8.0.
+
+Fixed bugs (FG#.. -> FramaGit bugs; FG!.. -> FG merge requests)
+ (FG: issues on Framagit; GH: issues on GitHub)
+ - FG#95: Wrong computation time for multicore execution after pstate change
+ - FG#97: Wrong computation time for ptask+multicore+pstates
+ - FG#98: SMPI offline simulation is inconsistent with the online simulation
+          (deadlocks / message truncation)
+ - FG#99: Weird segfault when not sealing an host
+
+----------------------------------------------------------------------------
+
+SimGrid (3.29) October 7. 2021
+
+The "Ask a stupid question" release.
+
+We wish that every user ask one question about SimGrid to celebrate.
+On Mattermost, Stack Overflow or using the issues tracker.
+
  
  New modeling features:
- - Non-linear resource sharing for decay models:
-   - The total capacity may depend on the number of concurrent usages
-   - For that, resources can take a callback that computes the capacity 
-     depending on the idle capacity and the number of concurrent usages
+ - Non-linear resource sharing, modeling resources whose performance heavily degrades with contention:
+   - The total capacity may be updated dynamically through a callback
+     and depends mainly on the number of concurrent flows.
     - Examples (both cpp and python): io-degradation, network-nonlinear, exec-cpu-nonlinear
  
- - Dynamic factors for CPU and disk: similarly to dynamic network factors,
-   allows the user to set a callback which can affect the progress of activities
-   (multiplicative factor applied when updating the amount of work remaining).
-   - Example: examples/cpp/exec-cpu-factors
+ - Dynamic factors: model variability in the speed of activities
+    - Each action can now have a factor that affects its progression.
+      This multiplicative factor is applied when updating the amount of work
+      remaining, thereby an activity with factor=0.5 only uses half of the
+      instantaneous power/bandwidth it is allocated and will appear twice
+      slower than what it actually consumes.
+    - This can be used to model a overhead (e.g., there is a 20 bytes
+      header in a 480 bytes TCP packet so the factor 0.9583) but the novelty
+      is this factor can now easily be adjusted depending on activity's and
+      resources characteristics.
+    - This existed for network (e.g., the effective bandwidth depends
+      on the message in SMPI piecewise-linear network model) but it is now
+      more general (the factor may depend on the source and destination and
+      thus account to different behaviors for intra-node communications and
+      extra-node communications) and is available for CPUs (e.g., if you
+      want to model an affinity as in the "Unrelated Machines" problem in
+      scheduling) and disks (e.g., if you want to model a stochastic
+      capacity) too.
+    - For that, resources can be provided with a callback that computes
+      the activity factor when creating the action.
+    - Example: examples/cpp/exec-cpu-factors
+    - The same mechanism is also available for the latency, which
+      allows to easily introduce complex variability patterns.
  
  Python:
   - Added support to programmatic platform creation in Python.
@@ -28,6 +193,9 @@ SMPI:
     - scan/excan can now be replayed
     - wait action now uses ranks and not pid, as the other ones.
     - smpi/init and smpi/finalization-barrier are now valid for replays.
+ - exit() is now intercepted by SMPI to avoid premature shutdown of
+   simulation. First non 0 return codes is returned as simulation return
+   code.
  
  Documentation:
    * New section "Release Notes" documenting recent and current developments.
@@ -38,11 +206,12 @@ ns-3 model:
   - Make wifi creation compatible with ns-3 version 3.34 too.
  
  Fixed bugs (FG#.. -> FramaGit bugs; FG!.. -> FG merge requests)
- (FG: issues on Framagit; GF: issues on GForge; GH: issues on GitHub)
+ (FG: issues on Framagit; GH: issues on GitHub)
+ - FG#77: Search feature of doc is broken (update sphinx theme version)
   - FG#78: Multiple fixes for SMPI replay:
      - TI tracing of allotallv/w was outputting wrong values
      - MPI_LOGICAL in fortran is actually 32 bits wide, and not 8.
- - FG#77: Search feature of doc is broken (update sphinx theme version)
+
  ----------------------------------------------------------------------------
  
  SimGrid (3.28) July 14. 2021
@@ -160,7 +329,7 @@ Simix:
   - Legacy functions deprecated in this release: SIMIX_get_clock(), SIMIX_run().
  
  Fixed bugs (FG#.. -> FramaGit bugs; FG!.. -> FG merge requests)
- (FG: issues on Framagit; GF: issues on GForge; GH: issues on GitHub)
+ (FG: issues on Framagit; GH: issues on GitHub)
   - FG#47: Complete and fix tests from teshuite/s4u/activity-lifecycle
   - FG#64: Configuring smpi/IB-penalty-factors
   - FG#67: Running computation concurrently with MPI_Iallreduce
@@ -264,7 +433,7 @@ C binding and interface:
     available as sg_actor_start_() and sg_actor_create_().
  
  Fixed bugs (FG#.. -> FramaGit bugs; FG!.. -> FG merge requests)
- (FG: issues on Framagit; GF: issues on GForge; GH: issues on GitHub)
+ (FG: issues on Framagit; GH: issues on GitHub)
   - FG#37: Parallel tasks are limited to 1 core per host
   - FG#62: Running "smpirun -replay" on large networks
   - FG!46: Fix a few potential memory leaks in SMPI colls