..........
Let's start with the code of the worker. It is represented by the
-*master* function below. This simple function takes 4 parameters,
-given as a vector of strings:
-
- - the number of workers managed by the master.
- - the number of tasks to dispatch
- - the computational size (in flops to compute) of each task
- - the communication size (in bytes to exchange) of each task
+*master* function below. This simple function takes at least 3
+parameters (the amount of tasks to dispatch, their computational size
+in flops to compute and their communication size in bytes to
+exchange). Every parameter after the third one must be the name of an
+host on which a worker is waiting for something to compute.
Then, the tasks are sent one after the other, each on a mailbox named
-"worker-XXX" where XXX is the number of an existing worker. On the
-other side, a given worker (which code is given below) wait for
-incoming tasks on its own mailbox. Notice how this mailbox mechanism
-allow the actors to find each other without having all information:
-the master don't have to know the actors nor even where they are, it
-simply pushes the messages on mailbox which name is predetermined.
+after the worker's hosts. On the other side, a given worker (which
+code is given below) wait for incoming tasks on its own
+mailbox.
+
+
At the end, once all tasks are dispatched, the master dispatches
another task per worker, but this time with a negative amount of flops
:start-after: master-begin
:end-before: master-end
-Here comes the code of the worker actors. This function expects only one
-parameter from its vector of strings: its identifier so that it knows
-on which mailbox its incoming tasks will arrive. Its code is very
-simple: as long as it gets valid computation requests (whose
-compute_amount is positive), it compute this task and waits for the
-next one.
+Here comes the code of the worker actors. This function expects no
+parameter from its vector of strings. Its code is very simple: it
+expects messages on the mailbox that is named after its own host. As long as it gets valid
+computation requests (whose compute_amount is positive), it compute
+this task and waits for the next one.
+
+The worker retrieves its own host with
+:cpp:func:`simgrid::s4u::this_actor::get_host`. The
+:ref:`simgrid::s4u::this_actor <namespace_simgrid__s4u__this_actor>`
+namespace contains many such helping functions.
.. literalinclude:: ../../examples/s4u/app-masterworkers/s4u-app-masterworkers-fun.cpp
:language: c++
.......................
And this is it. In only a few lines, we defined the algorithm of our
-master/workers examples. Well, this is true, but an algorithm alone is
-not enough to define a simulation.
+master/workers examples.
-First, SimGrid is a library, not a program. So you need to define your
-own `main()` function, as follows. This function is in charge of
+That being said, an algorithm alone is not enough to define a
+simulation: SimGrid is a library, not a program. So you need to define
+your own ``main()`` function as follows. This function is in charge of
creating a SimGrid simulation engine (on line 3), register the actor
functions to the engine (on lines 7 and 8), load the virtual platform
from its description file (on line 11), map actors onto that platform
:end-before: main-end
:linenos:
-After that, the missing pieces are the platform and deployment
-files.
+As you can see, this also requires a platform file and a deployment
+file.
Platform File
.............
only an excerpts of the full ``examples/platforms/small_platform.xml``
file. For example, most routing information are missing, and only the
route between the hosts Tremblay and Fafard is given. This path
-traverses 6 links (4, 3, 2, 0, 1 and 8). The full file, along with
-other examples, can be found in the archive under
-``examples/platforms``.
+traverses 6 links (named 4, 3, 2, 0, 1 and 8). There are several
+examples of platforms in the archive under ``examples/platforms``.
.. |api_s4u_NetZone| image:: /images/extlink.png
:align: middle
.. literalinclude:: ../../examples/platforms/small_platform.xml
:language: xml
- :lines: 1-10,12-20,56-63,192-
+ :lines: 1-10,12-20,56-62,192-
:caption: (excerpts of the small_platform.xml file)
Deployment File
**SimGrid was invented to answer such questions.** Do not believe the
fools saying that all you need to study such settings is a simple
discrete event simulator. Do you really want to reinvent the wheel,
-debug your own tool, optimize it and validate its models against real
+debug and optimize your own tool, and validate its models against real
settings for ages, or do you prefer to sit on the shoulders of a
giant? With SimGrid, you can focus on your algorithm. The whole
simulation mechanism is already working.
Exercise 1: Simplifying the deployment file
...........................................
-In the provided example, the deployment file is tightly connected to
-the platform file ``small_platform.xml`` and adding more workers
-quickly becomes a pain: You need to start them (at the bottom of the
-file), add to inform the master that they are available by increasing
-the right parameter.
-
-Instead, modify the simulator ``master-workers.c`` into
-``master-workers-exo1.c`` so that the master launches a worker process
-on `all` the other machines at startup. The new deployment file should
-be as simple as:
-
-.. code-block:: xml
-
- <?xml version='1.0'?>
- <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid/simgrid.dtd">
- <platform version="4.1">
- <actor host="Tremblay" function="master">
- <argument value="20"/> <!-- Number of tasks -->
- <argument value="50000000"/> <!-- Computation size of tasks -->
- <argument value="1000000"/> <!-- Communication size of tasks -->
- </actor>
- </platform>
+In the provided example, adding more workers quickly becomes a pain:
+You need to start them (at the bottom of the file), and to inform the
+master of its availability with an extra parameter. This is mandatory
+if you want to inform the master of where the workers are running. But
+actually, the master does not need to have this information.
+
+We could leverage the mailbox mechanism flexibility, and use a sort of
+yellow page system: Instead of sending data to the worker running on
+Fafard, the master could send data to the third worker. Ie, instead of
+using the worker location (which should be filled in two locations),
+we could use their ID (which should be filled in one location
+only).
+
+This could be done with the following deployment file. It's clearly
+not shorter than the previous one, but it's still simpler because the
+information is only written once. It thus follows the `DRY
+<https://en.wikipedia.org/wiki/Don't_repeat_yourself>`_ `SPOT
+<http://wiki.c2.com/?SinglePointOfTruth>`_ design principle.
+
+.. literalinclude:: tuto_s4u/deployment1.xml
+ :language: xml
+
+
+Copy your ``master-workers.cpp`` into ``master-workers-exo1.cpp`` and
+add a new executable into ``CMakeLists.txt``. Then modify your worker
+function so that it gets its mailbox name not from the name of its
+host, but from the string passed as ``args[1]``. The master will send
+messages to all workers based on their number, for example as follows:
+
+.. code-block:: cpp
+
+ for (int i = 0; i < tasks_count; i++) {
+ std::string worker_rank = std::to_string(i % workers_count);
+ std::string mailbox_name = std::string("worker-") + worker_rank;
+ simgrid::s4u::MailboxPtr mailbox = simgrid::s4u::Mailbox::by_name(mailbox_name);
+
+ mailbox->put(...);
+
+ ...
+ }
+
+
+Wrap up
+^^^^^^^
+
+The mailboxes are a very powerful mechanism in SimGrid, allowing many
+interesting application settings. They may feel surprising if you are
+used to BSD sockets or other classical systems, but you will soon
+appreciate their power. They are only used to match the
+communications, but have no impact on the communication
+timing. ``put()`` and ``get()`` are matched regardless of their
+initiators' location and then the real communication occures between
+the involved parties.
+
+Please refer to the full `API of Mailboxes
+<api/classsimgrid_1_1s4u_1_1Mailbox.html#class-documentation>`_
+|api_s4u_Mailbox|_ for more details.
+
+
+Exercise 2: Using the whole platform
+....................................
+
+It is now easier to add a new worker, but you still has to do it
+manually. It would be much easier if the master could start the
+workers on its own, one per available host in the platform. The new
+deployment file should be as simple as:
+
+.. literalinclude:: tuto_s4u/deployment2.xml
+ :language: xml
+
Creating the workers from the master
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In this exercise, we reduced the amount of configuration that our
simulator requests. This is both a good idea, and a dangerous
-trend. This simplification is an application of the good old DRY/SPOT
-programming principle (Don't Repeat Yourself / Single Point Of Truth
--- `more on wikipedia
+trend. This simplification is another application of the good old DRY/SPOT
+programming principle (`Don't Repeat Yourself / Single Point Of Truth
<https://en.wikipedia.org/wiki/Don%27t_repeat_yourself>`_), and you
really want your programming artefacts to follow these software
engineering principles.
namespace s4u {
/** @brief Mailboxes: Network rendez-vous points.
- * @ingroup s4u_api
*
- * @tableofcontents
- *
- * @section s4u_mb_what What are mailboxes
+ * <b>What are mailboxes?</b>
*
* Rendez-vous point for network communications, similar to URLs on
* which you could post and retrieve data. Actually, the mailboxes are
* find the receiver. The twitter hashtag, which help senders and
* receivers to find each others. In TCP, the pair {host name, host
* port} to which you can connect to find your interlocutor. In HTTP,
- * URLs through which the clients can connect to the servers.
+ * URLs through which the clients can connect to the servers. In ZeroMQ
+ * and other queuing systems, the queues are used to match senders
+ * and receivers.
*
- * One big difference with most of these systems is that usually, no
- * actor is the exclusive owner of a mailbox, neither in sending nor
- * in receiving. Many actors can send into and/or receive from the
+ * One big difference with most of these systems is that no actor is
+ * the exclusive owner of a mailbox, neither in sending nor in
+ * receiving. Many actors can send into and/or receive from the
* same mailbox. This is a big difference to the socket ports for
* example, that are definitely exclusive in receiving.
*
+ * Mailboxes can optionally have a @i receiver with `simgrid::s4u::Mailbox::set_receiver()`.
+ * It means that the data exchange starts as soon as the sender has
+ * done the `put()`, even before the corresponding `get()`
+ * (usually, it starts as soon as both `put()` and `get()` are posted).
+ * This is closer to the BSD semantic and can thus help to improve
+ * the timing accuracy, but this is not mandatory at all.
+ *
* A big difference with twitter hashtags is that SimGrid does not
* offer easy support to broadcast a given message to many
* receivers. So that would be like a twitter tag where each message
* is consumed by the first coming receiver.
*
+ * A big difference with the ZeroMQ queues is that you cannot filter
+ * on the data you want to get from the mailbox. To model such settings
+ * in SimGrid, you'd have one mailbox per potential topic, and subscribe
+ * to each topic individually with a `get_async()` on each mailbox.
+ * Then, use `Comm::wait_any()` to get the first message on any of the
+ * mailbox you are subscribed onto.
+ *
* The mailboxes are not located on the network, and you can access
* them without any latency. The network delay are only related to the
* location of the sender and receiver once the match between them is
* done on the mailbox. This is just like the phone number that you
* can use locally, and the geographical distance only comes into play
- * once you start the communication by dialling this number.
+ * once you start the communication by dialing this number.
*
- * @section s4u_mb_howto How to use mailboxes?
+ * <b>How to use mailboxes?</b>
*
* Any existing mailbox can be retrieve from its name (which are
* unique strings, just like with twitter tags). This results in a
*
* For something close to classical socket communications, use
* "hostname:port" as mailbox names, and make sure that only one actor
- * reads into that mailbox. It's hard to build a prefectly realistic
+ * reads into that mailbox. It's hard to build a perfectly realistic
* model of the TCP sockets, but most of the time, this system is too
* cumbersome for your simulations anyway. You probably want something
* simpler, that turns our to be easy to build with the mailboxes.
* the first relevant actor that can deal with the request will handle
* it.
*
- * @section s4u_mb_matching How are sends and receives matched?
+ * <b>How are sends and receives matched?</b>
*
* The matching algorithm is as simple as a first come, first
* serve. When a new send arrives, it matches the oldest enqueued
- * receive. If no receive is currently enqueued, then the incomming
+ * receive. If no receive is currently enqueued, then the incoming
* send is enqueued. As you can see, the mailbox cannot contain both
* send and receive requests: all enqueued requests must be of the
* same sort.
*
- * @section s4u_mb_receiver Declaring a receiving actor
+ * <b>Declaring a receiving actor</b>
*
* The last twist is that by default in the simulator, the data starts
* to be exchanged only when both the sender and the receiver are
* start as soon as possible, and the data will already be there on
* the receiver host when the receiver actor posts its receive().
*
- * @section s4u_mb_api The API
+ * <b>The API</b>
+ *
*/
class XBT_PUBLIC Mailbox {
#ifndef DOXYGEN
*
* It means that the communications sent to this mailbox will start flowing to
* its host even before he does a recv(). This models the real behavior of TCP
- * and MPI communications, amongst other.
+ * and MPI communications, amongst other. It will improve the accuracy of
+ * predictions, in particular if your application exhibits swarms of small messages.
+ *
+ * SimGrid does not enforces any kind of ownership over the mailbox. Even if a receiver
+ * was declared, any other actors can still get() data from the mailbox. The timings
+ * will then probably be off tracks, so you should strive on your side to not get data
+ * from someone else's mailbox.
*/
void set_receiver(ActorPtr actor);