.. _howto:
Modeling Hints
##############
There is no perfect model. Only models that are adapted to the
specific study that you want to do. SimGrid provides several advanced
mechanisms that you can adapt to model the situation that you are
interested in, and it is often uneasy to see where to start with.
This page collects several hints and tricks on modeling situations.
Even if you are looking for a very advanced, specific use case, these
examples may help you to design the solution you need.
.. _howto_science:
Doing Science with SimGrid
**************************
Many users are using SimGrid as a scientific instrument for their
research. This tool was indeed invented to that extent, and we strive
to streamline this kind of usage. But SimGrid is no magical tool, and
it is of your responsibility that the tool actually provides sensible
results. Fortunately, there is a vast literature on how to avoid
Modeling & Simulations pitfalls. We review here some specific works.
In `An Integrated Approach to Evaluating Simulation Credibility
`_, the authors
provide a methodology enabling the users to increase their confidence
in the simulation tools they use. First of all, you must know what you
actually expect to discover whether the tool actually covers your
needs. Then, as they say, "a fool with a tool is still a fool", so you
need to think about your methodology before you submit your articles.
`Towards a Credibility Assessment of Models and Simulations
`_
gives a formal methodology to assess the credibility of your
simulation results.
`Seven Pitfalls in Modeling and Simulation Research
`_ is even more
specific. Here are the listed pitfalls: (1) Don't know whether it's
modeling or simulation, (2) No separation of concerns, (3) No clear
scientific question, (4) Implementing everything from scratch, (5)
Unsupported claims, (6) Toy duck approach, and (7) The tunnel view. As
you can see, this article is a must read. It's a pity that it's not
freely available, though.
.. _howto_churn:
Modeling Churn (e.g., in P2P)
*****************************
One of the biggest challenges in P2P settings is to cope with the
churn, meaning that resources keep appearing and disappearing. In
SimGrid, you can always change the state of each host manually, with
eg :cpp:func:`simgrid::s4u::Host::turn_on`. To reduce the burden when
the churn is high, you can also attach a **state profile** to the host
directly.
This can be done through the XML file, using the ``state_file``
attribute of :ref:`pf_tag_host`, :ref:`pf_tag_cluster` or
:ref:`pf_tag_link`. Every line (but the last) of such files describes
timed events with the form "date value". Example:
.. code-block:: python
1 0
2 1
LOOPAFTER 8
- At time t = 1, the host is turned off (a zero value means OFF)
- At time t = 2, the host is turned back on (any other value than zero means ON)
- At time t = 10, the profile is reset (as we are 8 seconds after the last event). Then the host will be turned off
again at time t = 11.
If your profile does not contain any LOOPAFTER line, then it will be executed only once and not in a repetitive way.
Another possibility is to use the
:cpp:func:`simgrid::s4u::Host::set_state_profile()` or
:cpp:func:`simgrid::s4u::Link::set_state_profile()` functions. These
functions take a profile, that can be a fixed profile exhaustively
listing the events, or something else if you wish.
.. _howto_multicore:
Modeling Multicore Machines
***************************
Default Model
=============
Multicore machines are very complex, and there are many ways to model
them. The default models of SimGrid are coarse grain and capture some
elements of this reality. Here is how to declare simple multicore hosts:
.. code-block:: xml
It declares a 4-core host called "mymachine", each core computing 8
GFlops per second. If you put one activity of 8 GFlops on this host, it
will be computed in 1 second (by default, activities are
single-threaded and cannot leverage the computing power of more than
one core). If you run two such activities simultaneously, they will still be
computed in one second, and so on up to 4 activities. If you start 5 activities,
they will share the total computing power, and each activity will be
computed in 5/4 = 1.25 seconds. This is a very simple model, but that is
all what you get by default from SimGrid.
Pinning tasks to cores
======================
The default model does not account for task pinning, where you
manually select on which core each of the existing activity should
execute. The best solution to model this is probably to model your
4-core processor as 4 distinct hosts, and assigning the activities to
cores by migrating them to the declared hosts. In some sense, this
takes the whole Network-On-Chip idea really seriously.
Some extra complications may arise here. If you have more activities than
cores, you'll have to `schedule your activities
`_
yourself on the cores (so you'd better avoid this complexity). Since
you cannot have more than one network model in a given SimGrid
simulation, you will end up with a TCP connection between your cores. A
possible work around is to never start any simulated communication
between the cores and have the same routes from each core to the
rest of the external network.
Modeling a multicore CPU as a set of SimGrid hosts may seem strange
and unconvincing, but some users achieved very realistic simulations
of multicore and GPU machines this way.
Modeling machine boot and shutdown periods
********************************************
When a physical host boots up, a lot of things happen. It takes time
during which the machine is not usable but dissipates energy, and
programs actually die and restart during a reboot. Since there are many
ways to model it, SimGrid does not do any modeling choice for you but
the most obvious ones.
Any actor (or process in MSG) running on a host that is shut down
will be killed and all its activities (tasks in MSG) will be
automatically canceled. If the actor killed was marked as
auto-restartable (with
:cpp:func:`simgrid::s4u::Actor::set_auto_restart` or with
:cpp:func:`MSG_process_auto_restart_set`), it will start anew with the
same parameters when the host boots back up.
By default, shutdowns and boots are instantaneous. If you want to
add an extra delay, you have to do that yourself, for example from a
`controller` actor that runs on another host. The best way to do so is
to declare a fictional pstate where the CPU delivers 0 flop per
second (so every activity on that host will be frozen when the host is
in this pstate). When you want to switch the host off, your controller
switches the host to that specific pstate (with
:cpp:func:`simgrid::s4u::Host::set_pstate`), waits for the amount of
time that you decided necessary for your host to shut down, and turns
the host off (with :cpp:func:`simgrid::s4u::Host::turn_off`). To boot
up, switch the host on, go into the specific pstate, wait a while and
go to a more regular pstate.
To model the energy dissipation, you need to put the right energy
consumption in your startup/shutdown specific pstate. Remember that
the energy consumed is equal to the instantaneous consumption
multiplied by the time in which the host keeps in that state. Do the
maths, and set the right instantaneous consumption to your pstate, and
you'll get the whole boot period to consume the amount of energy that
you want. You may want to have one fictional pstate for the boot
period and another one for the shutdown period.
Of course, this is only one possible way to model these things. YMMV ;)
.. _understanding_lv08
Understanding the default TCP model
***********************************
When simulating a data transfer between two hosts, you may be surprised
by the obtained simulation time. Lets consider the following platform:
.. code-block:: xml
If host `A` sends `100kB` (a hundred kilobytes) to host `B`, one could expect
that this communication would take `0.81` seconds to complete according to a
simple latency-plus-size-divided-by-bandwidth model (0.01 + 8e5/1e6 = 0.81).
However, the default TCP model of SimGrid is a bit more complex than that. It
accounts for three phenomena that directly impact the simulation time even
on such a simple example:
- The size of a message at the application level (i.e., 100kB in this
example) is not the size that will actually be transferred over the
network. To mimic the fact that TCP and IP headers are added to each packet of
the original payload, the TCP model of SimGrid empirically considers that
`only 97% of the nominal bandwidth` are available. In other words, the
size of your message is increased by a few percents, whatever this size be.
- In the real world, the TCP protocol is not able to fully exploit the
bandwidth of a link from the emission of the first packet. To reflect this
`slow start` phenomenon, the latency declared in the platform file is
multiplied by `a factor of 13.01`. Here again, this is an empirically
determined value that may not correspond to every TCP implementations on
every networks. It can be tuned when more realistic simulated times for
short messages are needed though.
- When data is transferred from A to B, some TCP ACK messages travel in the
opposite direction. To reflect the impact of this `cross-traffic`, SimGrid
simulates a flow from B to A that represents an additional bandwidth
consumption of `0.05`. The route from B to A is implicity declared in the
platfrom file and uses the same link `link1` as if the two hosts were
connected through a communication bus. The bandwidth share allocated to the
flow from A to B is then the available bandwidth of `link1` (i.e., 97% of
the nominal bandwidth of 1Mb/s) divided by 1.05 (i.e., the total consumption).
This feature, activated by default, can be disabled by adding the
`--cfg=network/crosstraffic:0` flag to command line.
As a consequence, the time to transfer 100kB from A to B as simulated by the
default TCP model of SimGrid is not 0.81 seconds but
.. code-block:: python
0.01 * 13.01 + 800000 / ((0.97 * 1e6) / 1.05) = 0.996079 seconds.