Arnaud Giersch [Wed, 19 Sep 2018 21:46:58 +0000 (23:46 +0200)]
Revert "Dlopen privatization should be okay now for TSan."
This reverts commit
67e587e01b533cbe388602107fdd5ea6e8970513.
Arnaud Giersch [Wed, 19 Sep 2018 20:13:27 +0000 (22:13 +0200)]
Revert "Remove usage of RTLD_DEEPBIND."
It's in fact needed for starpu and some of the smpi proxy apps.
This reverts commit
f257ec7c9ab6e14b11ea63378065db42105882b5.
Arnaud Giersch [Wed, 19 Sep 2018 12:15:50 +0000 (14:15 +0200)]
Target_libs may be multiple.
Martin Quinson [Tue, 18 Sep 2018 20:14:45 +0000 (22:14 +0200)]
tuto smpi: start lab 3 (Execution Sampling)
But I'm stuck because that version of the NAS benchmarks is in
Fortran, and I fear that our macro don't work in this case...
Martin Quinson [Tue, 18 Sep 2018 20:04:08 +0000 (22:04 +0200)]
smpi tuto: Lab2 (Tracing and Replay)
Martin Quinson [Tue, 18 Sep 2018 19:23:15 +0000 (21:23 +0200)]
smpi tuto: Lab 1 on vizu
Martin Quinson [Tue, 18 Sep 2018 17:55:49 +0000 (19:55 +0200)]
smpi tuto: finish the Lab0
Martin Quinson [Tue, 18 Sep 2018 17:42:38 +0000 (19:42 +0200)]
fix ns3, again
Martin Quinson [Tue, 18 Sep 2018 16:18:33 +0000 (18:18 +0200)]
Merge branch 'master' of scm.gforge.inria.fr:/gitroot/simgrid/simgrid
Martin Quinson [Tue, 18 Sep 2018 16:17:45 +0000 (18:17 +0200)]
Unify the host names in cluster description files
This will allow to have only one hostfile for all of them.
Particularly useful for the tuto.
Arnaud Giersch [Tue, 18 Sep 2018 13:23:21 +0000 (15:23 +0200)]
Dlopen privatization should be okay now for TSan.
Arnaud Giersch [Tue, 18 Sep 2018 13:16:37 +0000 (15:16 +0200)]
Remove usage of RTLD_DEEPBIND.
It does not seem to be mandatory, and sanitizers are complaining.
Let's see if it passes on CI servers.
Arnaud Giersch [Tue, 18 Sep 2018 13:38:25 +0000 (15:38 +0200)]
Move check for null pointer before dereference.
Martin Quinson [Tue, 18 Sep 2018 14:29:14 +0000 (16:29 +0200)]
tuto smpi: Lab0 (hello world) drafted
Martin Quinson [Tue, 18 Sep 2018 13:16:09 +0000 (15:16 +0200)]
smpirun: make sure that <cluster is on its own line when computing the hostfile automatically
Martin Quinson [Tue, 18 Sep 2018 12:40:21 +0000 (14:40 +0200)]
dockerfiles: install our files under /source/ and refresh images
Martin Quinson [Tue, 18 Sep 2018 10:58:27 +0000 (12:58 +0200)]
tuto smpi: finish (for now) the platform section; draft the install section
Martin Quinson [Tue, 18 Sep 2018 07:50:32 +0000 (09:50 +0200)]
cosmetics
Martin Quinson [Tue, 18 Sep 2018 07:41:57 +0000 (09:41 +0200)]
cosmetics in graphical representations of cluster descriptions
Martin Quinson [Tue, 18 Sep 2018 06:39:59 +0000 (08:39 +0200)]
hopefully fix the NS3 test
Martin Quinson [Tue, 18 Sep 2018 05:39:18 +0000 (07:39 +0200)]
fix make distcheck, as usual :(
Martin Quinson [Tue, 18 Sep 2018 00:03:38 +0000 (02:03 +0200)]
ignore a directory generated by sphinx
Martin Quinson [Tue, 18 Sep 2018 00:02:56 +0000 (02:02 +0200)]
docs: sphinx 1.8.0 was released, so use it
Martin Quinson [Mon, 17 Sep 2018 23:58:32 +0000 (01:58 +0200)]
Merge branch 'master' of github.com:simgrid/simgrid
Martin Quinson [Mon, 17 Sep 2018 22:47:16 +0000 (00:47 +0200)]
Rename cluster.xml to cluster_backbone.xml
also, fix the make dist and some cosmetics.
Martin Quinson [Mon, 17 Sep 2018 22:30:18 +0000 (00:30 +0200)]
cleanups in the cluster platform files
Martin Quinson [Mon, 17 Sep 2018 22:16:46 +0000 (00:16 +0200)]
docs: prefer svg to png, and inclusion to copy/paste
Martin Quinson [Mon, 17 Sep 2018 21:46:10 +0000 (23:46 +0200)]
cosmetics on the graphical TOC
Martin Quinson [Mon, 17 Sep 2018 07:54:46 +0000 (09:54 +0200)]
Merge branch 'master' of scm.gforge.inria.fr:/gitroot/simgrid/simgrid
Martin Quinson [Fri, 14 Sep 2018 20:54:04 +0000 (22:54 +0200)]
fix sectionning and one typo
Arnaud Legrand [Fri, 14 Sep 2018 09:17:09 +0000 (11:17 +0200)]
Graphical representation of example platforms
Martin Quinson [Thu, 13 Sep 2018 22:31:12 +0000 (00:31 +0200)]
Merge pull request #292 from kovin/master
Cover with a test Mailbox::ready() method introduced in commit
1ed0e64dc40
Martin Quinson [Thu, 13 Sep 2018 20:00:17 +0000 (22:00 +0200)]
Merge branch 'master' into master
Martin Quinson [Tue, 11 Sep 2018 23:53:17 +0000 (01:53 +0200)]
SMPI tuto: Start stealing content from SMPI courseware
Martin Quinson [Tue, 11 Sep 2018 23:17:36 +0000 (01:17 +0200)]
tuto smpi: add a picture explaining how it works
Martin Quinson [Tue, 11 Sep 2018 23:16:33 +0000 (01:16 +0200)]
allow to have hidden/shown code blocks in the doc
Arnaud Giersch [Tue, 11 Sep 2018 20:35:20 +0000 (22:35 +0200)]
Add an assert/fixme around Actor::set_auto_restart.
Arnaud Giersch [Tue, 11 Sep 2018 20:27:20 +0000 (22:27 +0200)]
Use a std::vector for actors_at_boot_.
Several actors may use the same name (e.g. app-masterworker-multicore).
Also fixes a memory leak.
Martin Quinson [Tue, 11 Sep 2018 16:37:58 +0000 (18:37 +0200)]
start the SMPI tuto
Arnaud Giersch [Fri, 31 Aug 2018 11:19:34 +0000 (13:19 +0200)]
Typo.
Martin Quinson [Mon, 10 Sep 2018 21:30:38 +0000 (23:30 +0200)]
tuto: don't speak of s4u processes (but actors)
Martin Quinson [Mon, 10 Sep 2018 21:17:30 +0000 (23:17 +0200)]
docs: simplify and document that file
Martin Quinson [Mon, 10 Sep 2018 21:01:01 +0000 (23:01 +0200)]
killing trailing whitespaces on png files is not cleaver
Martin Quinson [Mon, 10 Sep 2018 20:33:39 +0000 (22:33 +0200)]
DTD: remove the last occurence of <gpu>
Martin Quinson [Mon, 10 Sep 2018 20:30:49 +0000 (22:30 +0200)]
tesh: informative message for another error condition
Martin Quinson [Mon, 10 Sep 2018 19:58:04 +0000 (21:58 +0200)]
Fix the DTD to not allow to mix internal node content with leaf content in a given zone
Fix https://github.com/simgrid/simgrid/issues/296
Martin Quinson [Mon, 10 Sep 2018 14:19:17 +0000 (16:19 +0200)]
fix the SMPI tests that mandate smpi/wtime == 0
Martin Quinson [Mon, 10 Sep 2018 13:03:49 +0000 (15:03 +0200)]
align doc and code on a more sensible value
Martin Quinson [Mon, 10 Sep 2018 12:42:52 +0000 (14:42 +0200)]
Merge branch 'master' of framagit.org:simgrid/simgrid
Martin Quinson [Mon, 10 Sep 2018 12:39:55 +0000 (14:39 +0200)]
Merge branch 'master' of scm.gforge.inria.fr:/gitroot/simgrid/simgrid
Martin Quinson [Mon, 10 Sep 2018 12:35:57 +0000 (14:35 +0200)]
Improve option smpi/wtime
- Set default value to 1ms instead of 0. This default settings may
lead to slower simulation, but it works in more situations.
- Also apply this delay in gettimeofday() and clock_gettime()
- Improve the documentation.
Augustin Degomme [Mon, 10 Sep 2018 11:39:29 +0000 (13:39 +0200)]
Allow insertion of time inside gettimeofday and clock_gettime
Done with --cfg=smpi/wtime, which was previously only for MPI_Wtime.
This should avoid some infinite loops. Keep 0 as default for now.
Martin Quinson [Mon, 10 Sep 2018 11:02:22 +0000 (13:02 +0200)]
move smpi_mpi_wtime near to the other time-related functions
Martin Quinson [Thu, 6 Sep 2018 19:39:26 +0000 (21:39 +0200)]
dont use send/receive on mailboxes, but put/get
FREDERIC SUTER [Wed, 5 Sep 2018 10:56:09 +0000 (12:56 +0200)]
Update app_s4u.rst
FREDERIC SUTER [Wed, 5 Sep 2018 10:17:03 +0000 (12:17 +0200)]
Update application.rst
Martin Quinson [Mon, 3 Sep 2018 19:41:38 +0000 (21:41 +0200)]
try to fix windows builds
ContextJava uses ContextThread as a superclass now, but they are not
in the same lib, so ContextThread must be exported as public.
FREDERIC SUTER [Mon, 3 Sep 2018 12:37:59 +0000 (14:37 +0200)]
Update intro_yours.rst
Augustin Degomme [Wed, 29 Aug 2018 12:31:17 +0000 (14:31 +0200)]
Multiply memset size by size of element in umpire.
FREDERIC SUTER [Mon, 3 Sep 2018 12:00:55 +0000 (14:00 +0200)]
Update intro_install.rst
FREDERIC SUTER [Mon, 3 Sep 2018 11:17:08 +0000 (13:17 +0200)]
Update intro_concepts.rst
Martin Quinson [Mon, 3 Sep 2018 07:34:38 +0000 (09:34 +0200)]
fix make distcheck
Martin Quinson [Mon, 3 Sep 2018 07:20:56 +0000 (09:20 +0200)]
Somehow fix the killing of actors in Java
Things are somehow fixed, as all tests seem to pass, but the situation
is still very messy after this commit. Contents:
- Reimplement ContextJava as subclass of ContextThread to reduce duplication.
- Don't send the StopRequest exception on host failure if we are in
Java because *some* of the actors don't catch it well, resulting in
simulation failure.
- Forcefully kill the process ("exit(0)" in C) after MSG_run() because
dead actors are sometimes not completely killed, preventing the
simulation from ending.
See the comment in ActorImpl for a better understanding of this mess
and how to fix it in the future.
Martin Quinson [Sun, 2 Sep 2018 19:35:09 +0000 (21:35 +0200)]
cosmetics while debuging backtraces
Martin Quinson [Sun, 2 Sep 2018 00:17:06 +0000 (02:17 +0200)]
java: obey our coding standard
Martin Quinson [Sun, 2 Sep 2018 00:09:27 +0000 (02:09 +0200)]
don't catch an exception that is never thrown
xbt_os_thread_create() asserts that it succeeds, it does not throw
anything. So put the documentation in the doc instead of displaying it
when that non-existent exception is received.
Martin Quinson [Sun, 2 Sep 2018 00:02:21 +0000 (02:02 +0200)]
java: cosmetics
Martin Quinson [Sat, 1 Sep 2018 23:11:54 +0000 (01:11 +0200)]
that was converted to sphinx
Martin Quinson [Sat, 1 Sep 2018 20:56:32 +0000 (22:56 +0200)]
Remove the deprecated 'state' attribute from the doc
This fixes https://github.com/simgrid/simgrid/issues/295
Martin Quinson [Sat, 1 Sep 2018 20:53:51 +0000 (22:53 +0200)]
docs: write the overall section of 'Applications'
Martin Quinson [Fri, 31 Aug 2018 15:58:58 +0000 (17:58 +0200)]
sphinx: one warning less
Martin Quinson [Thu, 30 Aug 2018 09:37:40 +0000 (11:37 +0200)]
Bummer. Really fix out of tree builds (I hope)
Martin Quinson [Thu, 30 Aug 2018 07:38:36 +0000 (09:38 +0200)]
fix out of tree builds
Martin Quinson [Wed, 29 Aug 2018 21:11:37 +0000 (23:11 +0200)]
fix maestro-set
Martin Quinson [Wed, 29 Aug 2018 20:50:07 +0000 (22:50 +0200)]
disable the platform-failure tests for now, sorry
I fail to debug such complex tests, I need smaller ones such as the
activity-lifecycle that I'm currently growing.
But broken tests in the git prevents everybody from working, including
me. I broke msg-maestro-set-thread at some point and did not even
notice :(
Sorry for breaking the failure platform tests in the first place.
Martin Quinson [Wed, 29 Aug 2018 20:31:09 +0000 (22:31 +0200)]
kill a superseeded sub-test, and fix another one
Processes on failing host are killed right away, so it cannot report
that the host failed as expected.
This whole test should be converted to activity-lifecycle.
Martin Quinson [Wed, 29 Aug 2018 20:13:26 +0000 (22:13 +0200)]
fix make dist
Martin Quinson [Wed, 29 Aug 2018 20:11:31 +0000 (22:11 +0200)]
this test is superseeded by activity-lifecycle
Martin Quinson [Wed, 29 Aug 2018 20:04:11 +0000 (22:04 +0200)]
simplify the actor finalization a tiny bit by using a callback
This is part of the removal of all trace-related pimpl all over the
code of MSG (my goal is to kill MSG_process_cleanup_from_SIMIX() all
together).
Note that I changed from Container::by_name() to
Container::by_name_or_null. It seems that not all actors have a
container by their name, not sure why.
Martin Quinson [Wed, 29 Aug 2018 19:24:26 +0000 (21:24 +0200)]
Convert all xbt_ex(network_error) throwing locations
Martin Quinson [Wed, 29 Aug 2018 19:19:40 +0000 (21:19 +0200)]
typo
Martin Quinson [Wed, 29 Aug 2018 13:19:17 +0000 (15:19 +0200)]
sonar
Martin Quinson [Wed, 29 Aug 2018 13:18:47 +0000 (15:18 +0200)]
woops
Martin Quinson [Wed, 29 Aug 2018 12:17:35 +0000 (14:17 +0200)]
fix 32b builds
Martin Quinson [Wed, 29 Aug 2018 11:14:01 +0000 (13:14 +0200)]
please sonar on rethrow
Martin Quinson [Wed, 29 Aug 2018 09:35:10 +0000 (11:35 +0200)]
Display a msg when contexts are killed by uncatched exceptions
and when I want to really kill an actor (eg when its host is turned
off), I launch an uncatchable kernel::Context::StopRequest instead of
a catchable simgrid::HostFailureException (which will be used in case
of remote exec and similar)
Maybe there should be a config flag to decide if we want to kill the
simulation when an actor fails. The current setting forces the user to
add try/catch (simgrid::Exception) around their main functions. That's
not a bad thing either, not sure.
Martin Quinson [Wed, 29 Aug 2018 00:10:12 +0000 (02:10 +0200)]
Let's exhaustively test the activity lifecycle
This test is not complete yet. It aims at being as exhaustive and
paranoid as possible, just like cloud-sharing even if I didn't find a
good DSL to specify the tests this time.
Martin Quinson [Wed, 29 Aug 2018 00:08:18 +0000 (02:08 +0200)]
improve debug messages and error reporting
Martin Quinson [Tue, 28 Aug 2018 23:59:17 +0000 (01:59 +0200)]
Properly kill the context on HostFailureException
Before, simix was kinda thinking that the actor was dead, but the
context was still running, leading to a Holy Big Mess!
Augustin Degomme [Tue, 28 Aug 2018 15:46:08 +0000 (17:46 +0200)]
update doc
Augustin Degomme [Tue, 28 Aug 2018 15:39:33 +0000 (17:39 +0200)]
Switch to ompi for umpire tests.
MPICH changes brought SMP-aware algorithm, which MC does not really like.
I guess the init_smp is the culprit here, as it uses badly various collectives.
Augustin Degomme [Tue, 28 Aug 2018 15:37:50 +0000 (17:37 +0200)]
update ompi selector as well with "recent" version
Augustin Degomme [Tue, 28 Aug 2018 14:32:41 +0000 (16:32 +0200)]
Requalify automatic tesh, as another algorithm is used in init_smp now.
Augustin Degomme [Tue, 28 Aug 2018 14:29:22 +0000 (16:29 +0200)]
update doc with new algo
Augustin Degomme [Tue, 28 Aug 2018 14:23:24 +0000 (16:23 +0200)]
Upgrade MPICH collective selector to 3.3.
Add SMP variants of some algorithms, and protect against side effects.
Martin Quinson [Sun, 26 Aug 2018 23:49:51 +0000 (01:49 +0200)]
circleci: do not optimise builds, you're supposed to be as fast as hell
Augustin Degomme [Tue, 28 Aug 2018 08:19:07 +0000 (10:19 +0200)]
Fix https://github.com/simgrid/simgrid/issues/294
Martin Quinson [Sun, 26 Aug 2018 23:42:53 +0000 (01:42 +0200)]
Not sure of why it helps now
Martin Quinson [Sun, 26 Aug 2018 23:42:32 +0000 (01:42 +0200)]
fix travis builds
Martin Quinson [Sun, 26 Aug 2018 22:37:52 +0000 (00:37 +0200)]
strenghten this test