From: Martin Quinson Date: Tue, 9 Oct 2018 23:31:38 +0000 (+0200) Subject: Merge branch 'master' of scm.gforge.inria.fr:/gitroot/simgrid/simgrid X-Git-Tag: v3_22~917 X-Git-Url: http://info.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/commitdiff_plain/fcfc285a211242ff365a365ffdcd3fab0f426a29?hp=4299696f4a0ac31e2758ee601f09c607ed42676a Merge branch 'master' of scm.gforge.inria.fr:/gitroot/simgrid/simgrid --- diff --git a/.gitignore b/.gitignore index 5032bee886..205647803a 100644 --- a/.gitignore +++ b/.gitignore @@ -192,6 +192,7 @@ examples/s4u/replay-storage/s4u-replay-storage examples/s4u/routing-get-clusters/s4u-routing-get-clusters examples/s4u/synchro-barrier/s4u-synchro-barrier examples/s4u/synchro-mutex/s4u-synchro-mutex +examples/s4u/synchro-semaphore/s4u-synchro-semaphore examples/s4u/trace-platform/s4u-trace-platform examples/simdag/availability/sd_availability examples/simdag/dag-dotload/sd_dag-dotload diff --git a/ChangeLog b/ChangeLog index 3fe5f937a2..622c291e43 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,10 @@ SimGrid (3.22) NOT RELEASED (Release Target: December 21. 2018, 22:23 UTC) +Fixed bugs: + - #261: Document the parameters of parallel execution's constructor + +---------------------------------------------------------------------------- + SimGrid (3.21) October 3. 2018 The Restarting Documentation (TRD) Release. diff --git a/docs/source/app_s4u.rst b/docs/source/app_s4u.rst index 0c26d1809e..a3c853cf79 100644 --- a/docs/source/app_s4u.rst +++ b/docs/source/app_s4u.rst @@ -123,6 +123,9 @@ provides many helper functions to simplify the code of actors. .. |API_s4u_Mailbox| replace:: **Mailbox** +.. |API_s4u_Mailboxes| replace:: **Mailboxes** +.. _API_s4u_Mailboxes: #s4u-mailbox + .. |API_s4u_NetZone| replace:: **NetZone** .. |API_s4u_Barrier| replace:: **Barrier** @@ -213,6 +216,119 @@ Sometimes, you want to change the setting of an activity before it even starts. .. todo:: write this section +.. _s4u_mailbox: + +Mailboxes +********* + +Please also refer to the :ref:`API reference for s4u::Mailbox +`. + +=================== +What are Mailboxes? +=================== + +|API_s4u_Mailboxes|_ are rendez-vous points for network communications, +similar to URLs on which you could post and retrieve data. Actually, +the mailboxes are not involved in the communication once it starts, +but only to find the contact with which you want to communicate. + +They are similar to many common things: The phone number, which allows +the caller to find the receiver. The twitter hashtag, which help +senders and receivers to find each others. In TCP, the pair +``{host name, host port}`` to which you can connect to find your peer. +In HTTP, URLs through which the clients can connect to the servers. +In ZeroMQ, the queues are used to match senders and receivers. + +One big difference with most of these systems is that no actor is the +exclusive owner of a mailbox, neither in sending nor in receiving. +Many actors can send into and/or receive from the same mailbox. TCP +socket ports for example are shared on the sender side but exclusive +on the receiver side (only one process can receive from a given socket +at a given point of time). + +A big difference with TCP sockets or MPI communications is that +communications do not start right away after a +:cpp:func:`Mailbox::put() `, but wait +for the corresponding :cpp:func:`Mailbox::get() `. +You can change this by :ref:`declaring a receiving actor `. + +A big difference with twitter hashtags is that SimGrid does not +offer easy support to broadcast a given message to many +receivers. So that would be like a twitter tag where each message +is consumed by the first receiver. + +A big difference with the ZeroMQ queues is that you cannot filter +on the data you want to get from the mailbox. To model such settings +in SimGrid, you'd have one mailbox per potential topic, and subscribe +to each topic individually with a +:cpp:func:`get_async() ` on each mailbox. +Then, use :cpp:func:`Comm::wait_any() ` +to get the first message on any of the mailbox you are subscribed onto. + +The mailboxes are not located on the network, and you can access +them without any latency. The network delay are only related to the +location of the sender and receiver once the match between them is +done on the mailbox. This is just like the phone number that you +can use locally, and the geographical distance only comes into play +once you start the communication by dialing this number. + +===================== +How to use Mailboxes? +===================== + +You can retrieve any existing mailbox from its name (which is a +unique string, just like a twitter tag). This results in a +versatile mechanism that can be used to build many different +situations. + +To model classical socket communications, use "hostname:port" as +mailbox names, and make sure that only one actor reads into a given +mailbox. This does not make it easy to build a perfectly realistic +model of the TCP sockets, but in most cases, this system is too +cumbersome for your simulations anyway. You probably want something +simpler, that turns our to be easy to build with the mailboxes. + +Many SimGrid examples use a sort of yellow page system where the +mailbox names are the name of the service (such as "worker", +"master" or "reducer"). That way, you don't have to know where your +peer is located to contact it. You don't even need its name. Its +function is enough for that. This also gives you some sort of load +balancing for free if more than one actor pulls from the mailbox: +the first actor that can deal with the request will handle it. + +========================================= +How put() and get() Requests are Matched? +========================================= + +The matching algorithm simple: first come, first serve. When a new +send arrives, it matches the oldest enqueued receive. If no receive is +currently enqueued, then the incoming send is enqueued. As you can +see, the mailbox cannot contain both send and receive requests: all +enqueued requests must be of the same sort. + +.. _s4u_receiving_actor: + +=========================== +Declaring a Receiving Actor +=========================== + +The last twist is that by default in the simulator, the data starts +to be exchanged only when both the sender and the receiver are +declared (it waits until both :cpp:func:`put() ` +and :cpp:func:`get() ` are posted). +In TCP, since you establish connexions beforehand, the data starts to +flow as soon as the sender posts it, even if the receiver did not post +its :cpp:func:`recv() ` yet. + +To model this in SimGrid, you can declare a specific receiver to a +given mailbox (with the function +:cpp:func:`set_receiver() `). +That way, any :cpp:func:`put() ` +posted to that mailbox will start as soon as possible, and the data +will already be there on the receiver host when the receiver actor +posts its :cpp:func:`get() ` + Memory Management ***************** @@ -371,6 +487,8 @@ s4u::Link s4u::Mailbox ============ +Please also refer to the :ref:`full doc on s4u::Mailbox `. + .. doxygentypedef:: MailboxPtr .. doxygenclass:: simgrid::s4u::Mailbox diff --git a/examples/s4u/README.rst b/examples/s4u/README.rst index f672f35432..b1b996da8b 100644 --- a/examples/s4u/README.rst +++ b/examples/s4u/README.rst @@ -140,6 +140,8 @@ Communications on the Network .. todo:: add the `ready` example here +.. _s4u_ex_execution: + Executions on the CPU --------------------- @@ -171,9 +173,10 @@ Executions on the CPU |br| `examples/s4u/exec-dvfs/s4u-exec-dvfs.cpp `_ |br| `examples/platforms/energy_platform.xml `_ - - **Parallel tasks:** + - **Parallel executions:** These objects are convenient abstractions of parallel - computational kernels that span over several machines. + computational kernels that span over several machines, such as a + PDGEM and the other ScaLAPACK routines. |br| `examples/s4u/exec-ptask/s4u-exec-ptask.cpp `_ I/O on Disks and Files diff --git a/include/simgrid/msg.h b/include/simgrid/msg.h index 2abf905374..1092890a09 100644 --- a/include/simgrid/msg.h +++ b/include/simgrid/msg.h @@ -237,8 +237,6 @@ typedef struct msg_task* msg_task_t; #define MSG_TASK_UNINITIALIZED NULL /** @brief Return code of most MSG functions - @ingroup msg_simulation - @{ */ /* Keep these code as binary values: java bindings manipulate | of these values */ typedef enum { MSG_OK = 0, /**< @brief Everything is right. Keep on going this way ! */ @@ -251,7 +249,6 @@ typedef enum { return now !*/ MSG_TASK_CANCELED = 8 /**< @brief Canceled task. This task has been canceled by somebody!*/ } msg_error_t; -/** @} */ /************************** Global ******************************************/ /** @brief set a configuration variable diff --git a/include/simgrid/s4u/Actor.hpp b/include/simgrid/s4u/Actor.hpp index a951265671..5cc2bca8ee 100644 --- a/include/simgrid/s4u/Actor.hpp +++ b/include/simgrid/s4u/Actor.hpp @@ -410,8 +410,56 @@ XBT_PUBLIC void execute(double flop); * An execution of priority 2 computes twice as fast as an execution at priority 1. */ XBT_PUBLIC void execute(double flop, double priority); -XBT_PUBLIC void parallel_execute(int host_nb, sg_host_t* host_list, double* flops_amount, double* bytes_amount); -XBT_PUBLIC void parallel_execute(int host_nb, sg_host_t* host_list, double* flops_amount, double* bytes_amount, +/** Block the actor until the built parallel execution terminates + * + * \rst + * .. _API_s4u_parallel_execute: + * + * Parallel executions convenient abstractions of parallel computational kernels that span over several machines, + * such as a PDGEM and the other ScaLAPACK routines. If you are interested in the effects of such parallel kernel + * on the platform (e.g. to schedule them wisely), there is no need to model them in all details of their internal + * execution and communications. It is much more convenient to model them as a single execution activity that spans + * over several hosts. This is exactly what s4u's Parallel Executions are. + * + * To build such an object, you need to provide a list of hosts that are involved in the parallel kernel (the + * actor's own host may or may not be in this list) and specify the amount of computations that should be done by + * each host, using a vector of flops amount. Then, you should specify the amount of data exchanged between each + * hosts during the parallel kernel. For that, a matrix of values is expected. + * + * For example, if your list of hosts is ``[host0, host1]``, passing a vector ``[1000, 2000]`` as a `flops_amount` + * vector means that `host0` should compute 1000 flops while `host1` will compute 2000 flops. A matrix of + * communications' sizes of ``[0, 1, 2, 3]`` specifies the following data exchanges: + * + * +-----------+-------+------+ + * |from \\ to | host0 | host1| + * +===========+=======+======+ + * |host0 | 0 | 1 | + * +-----------+-------+------+ + * |host1 | 2 | 3 | + * +-----------+-------+------+ + * + * - From host0 to host0: 0 bytes are exchanged + * - From host0 to host1: 1 byte is exchanged + * - From host1 to host0: 2 bytes are exchanged + * - From host1 to host1: 3 bytes are exchanged + * + * In a parallel execution, all parts (all executions on each hosts, all communications) progress exactly at the + * same pace, so they all terminate at the exact same pace. If one part is slow because of a slow resource or + * because of contention, this slows down the parallel execution as a whole. + * + * These objects are somewhat surprising from a modeling point of view. For example, the unit of their speed is + * somewhere between flop/sec and byte/sec. It is **strongly advised** to only use the LV08 host model when using + * parallel executions. Note that you can mix regular executions and communications with parallel executions, + * provided that the platform model is LV08. + * + * \endrst + */ + +XBT_PUBLIC void parallel_execute(int host_nb, s4u::Host* host_list, double* flops_amount, double* bytes_amount); +/** \rst + * Block the actor until the built :ref:`parallel execution ` completes, or until the timeout. + * \endrst*/ +XBT_PUBLIC void parallel_execute(int host_nb, s4u::Host* host_list, double* flops_amount, double* bytes_amount, double timeout); XBT_PUBLIC ExecPtr exec_init(double flops_amounts); diff --git a/include/simgrid/s4u/Mailbox.hpp b/include/simgrid/s4u/Mailbox.hpp index 5fbcf2f7f7..197b2426f9 100644 --- a/include/simgrid/s4u/Mailbox.hpp +++ b/include/simgrid/s4u/Mailbox.hpp @@ -14,107 +14,7 @@ namespace simgrid { namespace s4u { -/** @brief Mailboxes: Network rendez-vous points. - * - * What are mailboxes? - * - * Rendez-vous point for network communications, similar to URLs on - * which you could post and retrieve data. Actually, the mailboxes are - * not involved in the communication once it starts, but only to find - * the contact with which you want to communicate. - - * Here are some mechanisms similar to the mailbox in other - * communication systems: The phone number, which allows the caller to - * find the receiver. The twitter hashtag, which help senders and - * receivers to find each others. In TCP, the pair {host name, host - * port} to which you can connect to find your interlocutor. In HTTP, - * URLs through which the clients can connect to the servers. In ZeroMQ - * and other queuing systems, the queues are used to match senders - * and receivers. - * - * One big difference with most of these systems is that no actor is - * the exclusive owner of a mailbox, neither in sending nor in - * receiving. Many actors can send into and/or receive from the - * same mailbox. This is a big difference to the socket ports for - * example, that are definitely exclusive in receiving. - * - * Mailboxes can optionally have a @i receiver with `simgrid::s4u::Mailbox::set_receiver()`. - * It means that the data exchange starts as soon as the sender has - * done the `put()`, even before the corresponding `get()` - * (usually, it starts as soon as both `put()` and `get()` are posted). - * This is closer to the BSD semantic and can thus help to improve - * the timing accuracy, but this is not mandatory at all. - * - * A big difference with twitter hashtags is that SimGrid does not - * offer easy support to broadcast a given message to many - * receivers. So that would be like a twitter tag where each message - * is consumed by the first coming receiver. - * - * A big difference with the ZeroMQ queues is that you cannot filter - * on the data you want to get from the mailbox. To model such settings - * in SimGrid, you'd have one mailbox per potential topic, and subscribe - * to each topic individually with a `get_async()` on each mailbox. - * Then, use `Comm::wait_any()` to get the first message on any of the - * mailbox you are subscribed onto. - * - * The mailboxes are not located on the network, and you can access - * them without any latency. The network delay are only related to the - * location of the sender and receiver once the match between them is - * done on the mailbox. This is just like the phone number that you - * can use locally, and the geographical distance only comes into play - * once you start the communication by dialing this number. - * - * How to use mailboxes? - * - * Any existing mailbox can be retrieve from its name (which are - * unique strings, just like with twitter tags). This results in a - * versatile mechanism that can be used to build many different - * situations. - * - * For something close to classical socket communications, use - * "hostname:port" as mailbox names, and make sure that only one actor - * reads into that mailbox. It's hard to build a perfectly realistic - * model of the TCP sockets, but most of the time, this system is too - * cumbersome for your simulations anyway. You probably want something - * simpler, that turns our to be easy to build with the mailboxes. - * - * Many SimGrid examples use a sort of yellow page system where the - * mailbox names are the name of the service (such as "worker", - * "master" or "reducer"). That way, you don't have to know where your - * peer is located to contact it. You don't even need its name. Its - * function is enough for that. This also gives you some sort of load - * balancing for free if more than one actor pulls from the mailbox: - * the first relevant actor that can deal with the request will handle - * it. - * - * How are sends and receives matched? - * - * The matching algorithm is as simple as a first come, first - * serve. When a new send arrives, it matches the oldest enqueued - * receive. If no receive is currently enqueued, then the incoming - * send is enqueued. As you can see, the mailbox cannot contain both - * send and receive requests: all enqueued requests must be of the - * same sort. - * - * Declaring a receiving actor - * - * The last twist is that by default in the simulator, the data starts - * to be exchanged only when both the sender and the receiver are - * declared while in real systems (such as TCP or MPI), the data - * starts to flow as soon as the sender posts it, even if the receiver - * did not post its recv() yet. This can obviously lead to bad - * simulation timings, as the simulated communications do not start at - * the exact same time than the real ones. - * - * If the simulation timings are very important to you, you can - * declare a specific receiver to a given mailbox (with the function - * setReceiver()). That way, any send() posted to that mailbox will - * start as soon as possible, and the data will already be there on - * the receiver host when the receiver actor posts its receive(). - * - * The API - * - */ +/** @brief Mailboxes: Network rendez-vous points. */ class XBT_PUBLIC Mailbox { friend simgrid::s4u::Comm; friend simgrid::kernel::activity::MailboxImpl; diff --git a/src/msg/msg_task.cpp b/src/msg/msg_task.cpp index 7a51afd370..07e6db979b 100644 --- a/src/msg/msg_task.cpp +++ b/src/msg/msg_task.cpp @@ -59,6 +59,12 @@ msg_task_t MSG_task_create(const char *name, double flop_amount, double message_ /** @brief Creates a new #msg_task_t (a parallel one....). * * A constructor for #msg_task_t taking six arguments and returning the corresponding object. + * + * \rst + * See :cpp:func:`void simgrid::s4u::this_actor::parallel_execute(int, s4u::Host*, double*, double*)` for + * the exact semantic of the parameters. + * \endrst + * * @param name a name for the object. It is for user-level information and can be nullptr. * @param host_nb the number of hosts implied in the parallel task. * @param host_list an array of @p host_nb msg_host_t. @@ -67,9 +73,7 @@ msg_task_t MSG_task_create(const char *name, double flop_amount, double message_ * @param bytes_amount an array of @p host_nb* @p host_nb doubles. * @param data a pointer to any data may want to attach to the new object. * It is for user-level information and can be nullptr. - * It can be retrieved with the function @ref MSG_task_get_data. - * @see msg_task_t - * @return The new corresponding object. + * It can be retrieved with the function @ref MSG_task_get_data(). */ msg_task_t MSG_parallel_task_create(const char *name, int host_nb, const msg_host_t * host_list, double *flops_amount, double *bytes_amount, void *data)