X-Git-Url: http://info.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/blobdiff_plain/bf4e9e11d5a5851f8adea9bdaa8f4cfef417add3..b84a3c964b5b1bb8922c1ca1a7eb11e2786ee6c7:/examples/s4u/platform-failures/s4u-platform-failures.tesh diff --git a/examples/s4u/platform-failures/s4u-platform-failures.tesh b/examples/s4u/platform-failures/s4u-platform-failures.tesh index e51427bf26..007a7c1cd3 100644 --- a/examples/s4u/platform-failures/s4u-platform-failures.tesh +++ b/examples/s4u/platform-failures/s4u-platform-failures.tesh @@ -3,52 +3,215 @@ p Testing a simple master/worker example application handling failures TCP crosstraffic DISABLED ! output sort 19 -$ $SG_TEST_EXENV ${bindir:=.}/s4u-platform-failures$EXEEXT --log=xbt_cfg.thres:critical --log=no_loc ${platfdir}/small_platform_with_failures.xml ${bindir}/../app-masterworker/s4u-app-masterworker_d.xml --cfg=path:${srcdir} --cfg=network/crosstraffic:0 "--log=root.fmt:[%10.6r]%e(%i:%P@%h)%e%m%n" -> [ 0.000000] (0:maestro@) Cannot launch process 'worker' on failed host 'Fafard' +$ $SG_TEST_EXENV ${bindir:=.}/s4u-platform-failures$EXEEXT --log=xbt_cfg.thres:critical --log=no_loc ${platfdir}/small_platform_failures.xml ${srcdir:=.}/s4u-platform-failures_d.xml --cfg=path:${srcdir} --cfg=network/crosstraffic:0 "--log=root.fmt:[%10.6r]%e(%i:%P@%h)%e%m%n" --log=surf_cpu.t:verbose +> [ 0.000000] (0:maestro@) Cannot launch actor 'worker' on failed host 'Fafard' +> [ 0.000000] (0:maestro@) Deployment includes some initially turned off Hosts ... nevermind. > [ 0.000000] (1:master@Tremblay) Got 5 workers and 20 tasks to process > [ 0.000000] (1:master@Tremblay) Send a message to worker-0 > [ 0.010309] (1:master@Tremblay) Send to worker-0 completed +> [ 0.010309] (2:worker@Tremblay) Start execution... > [ 0.000000] (2:worker@Tremblay) Waiting a message on worker-0 > [ 0.000000] (3:worker@Jupiter) Waiting a message on worker-1 > [ 0.000000] (4:worker@Ginette) Waiting a message on worker-3 > [ 0.000000] (5:worker@Bourassa) Waiting a message on worker-4 > [ 0.010309] (1:master@Tremblay) Send a message to worker-1 > [ 1.000000] (0:maestro@) Restart processes on host Fafard -> [ 1.000000] (1:master@Tremblay) Mmh. Something went wrong with 'worker-1'. Nevermind. Let's keep going! +> [ 1.000000] (6:worker@Fafard) Waiting a message on worker-2 +> [ 1.000000] (1:master@Tremblay) Mmh. The communication with 'worker-1' failed. Nevermind. Let's keep going! > [ 1.000000] (1:master@Tremblay) Send a message to worker-2 > [ 1.000000] (3:worker@Jupiter) Gloups. The cpu on which I'm running just turned off!. See you! +> [ 2.000000] (1:master@Tremblay) Mmh. The communication with 'worker-2' failed. Nevermind. Let's keep going! +> [ 2.000000] (6:worker@Fafard) Gloups. The cpu on which I'm running just turned off!. See you! > [ 2.000000] (0:maestro@) Restart processes on host Jupiter +> [ 2.000000] (1:master@Tremblay) Send a message to worker-3 +> [ 2.000000] (7:worker@Jupiter) Waiting a message on worker-1 +> [ 2.010309] (2:worker@Tremblay) Execution complete. > [ 2.010309] (2:worker@Tremblay) Waiting a message on worker-0 -> [ 11.000000] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! -> [ 11.000000] (1:master@Tremblay) Send a message to worker-3 -> [ 12.030928] (1:master@Tremblay) Send to worker-3 completed -> [ 12.030928] (1:master@Tremblay) Send a message to worker-4 -> [ 13.061856] (1:master@Tremblay) Send to worker-4 completed -> [ 13.061856] (1:master@Tremblay) Send a message to worker-0 -> [ 13.072165] (1:master@Tremblay) Send to worker-0 completed -> [ 13.072165] (1:master@Tremblay) Send a message to worker-1 -> [ 14.103093] (1:master@Tremblay) Send to worker-1 completed -> [ 24.103093] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! -> [ 24.103093] (1:master@Tremblay) Mmh. Something went wrong with 'worker-3'. Nevermind. Let's keep going! -> [ 24.103093] (4:worker@Ginette) Mmh. Something went wrong. Nevermind. Let's keep going! -> [ 25.134021] (1:master@Tremblay) Send completed -> [ 25.144330] (1:master@Tremblay) Send completed -> [ 26.175258] (1:master@Tremblay) Send completed -> [ 36.175258] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! -> [ 37.206186] (1:master@Tremblay) Send completed -> [ 37.206186] (1:master@Tremblay) Mmh. Something went wrong with 'worker-4'. Nevermind. Let's keep going! -> [ 37.206186] (5:worker@Bourassa) Mmh. Something went wrong. Nevermind. Let's keep going! -> [ 38.247423] (1:master@Tremblay) Send completed -> [ 48.247423] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! -> [ 49.278351] (1:master@Tremblay) Send completed -> [ 50.000000] (4:worker@Ginette) Gloups. The cpu on which I'm running just turned off!. See you! -> [ 50.309278] (1:master@Tremblay) Send completed -> [ 50.309278] (1:master@Tremblay) All tasks have been dispatched. Let's tell everybody the computation is over. -> [ 50.309278] (2:worker@Tremblay) I'm done. See you! -> [ 50.309278] (6:worker@Jupiter) I'm done. See you! -> [ 51.309278] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! -> [ 52.309278] (0:maestro@) Simulation time 52.3093 -> [ 52.309278] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-3'. Nevermind. Let's keep going! -> [ 52.309278] (1:master@Tremblay) Goodbye now! -> [ 52.309278] (5:worker@Bourassa) I'm done. See you! +> [ 3.030928] (1:master@Tremblay) Send to worker-3 completed +> [ 3.030928] (1:master@Tremblay) Send a message to worker-4 +> [ 3.030928] (4:worker@Ginette) Start execution... +> [ 4.061856] (1:master@Tremblay) Send to worker-4 completed +> [ 4.061856] (1:master@Tremblay) Send a message to worker-0 +> [ 4.061856] (5:worker@Bourassa) Start execution... +> [ 4.072165] (1:master@Tremblay) Send to worker-0 completed +> [ 4.072165] (1:master@Tremblay) Send a message to worker-1 +> [ 4.072165] (2:worker@Tremblay) Start execution... +> [ 5.030928] (4:worker@Ginette) Execution complete. +> [ 5.030928] (4:worker@Ginette) Waiting a message on worker-3 +> [ 5.103093] (1:master@Tremblay) Send to worker-1 completed +> [ 5.103093] (1:master@Tremblay) Send a message to worker-2 +> [ 5.103093] (7:worker@Jupiter) Start execution... +> [ 6.061856] (5:worker@Bourassa) Execution complete. +> [ 6.061856] (5:worker@Bourassa) Waiting a message on worker-4 +> [ 6.072165] (2:worker@Tremblay) Execution complete. +> [ 6.072165] (2:worker@Tremblay) Waiting a message on worker-0 +> [ 7.103093] (7:worker@Jupiter) Execution complete. +> [ 7.103093] (7:worker@Jupiter) Waiting a message on worker-1 +> [ 15.103093] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! +> [ 15.103093] (1:master@Tremblay) Send a message to worker-3 +> [ 15.103093] (1:master@Tremblay) Mmh. The communication with 'worker-3' failed. Nevermind. Let's keep going! +> [ 15.103093] (1:master@Tremblay) Send a message to worker-4 +> [ 15.103093] (4:worker@Ginette) Mmh. Something went wrong. Nevermind. Let's keep going! +> [ 15.103093] (4:worker@Ginette) Waiting a message on worker-3 +> [ 16.134021] (1:master@Tremblay) Send to worker-4 completed +> [ 16.134021] (1:master@Tremblay) Send a message to worker-0 +> [ 16.134021] (5:worker@Bourassa) Start execution... +> [ 16.144330] (1:master@Tremblay) Send to worker-0 completed +> [ 16.144330] (1:master@Tremblay) Send a message to worker-1 +> [ 16.144330] (2:worker@Tremblay) Start execution... +> [ 17.175258] (1:master@Tremblay) Send to worker-1 completed +> [ 17.175258] (1:master@Tremblay) Send a message to worker-2 +> [ 17.175258] (7:worker@Jupiter) Start execution... +> [ 18.134021] (5:worker@Bourassa) Execution complete. +> [ 18.134021] (5:worker@Bourassa) Waiting a message on worker-4 +> [ 18.144330] (2:worker@Tremblay) Execution complete. +> [ 18.144330] (2:worker@Tremblay) Waiting a message on worker-0 +> [ 19.175258] (7:worker@Jupiter) Execution complete. +> [ 19.175258] (7:worker@Jupiter) Waiting a message on worker-1 +> [ 27.175258] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! +> [ 27.175258] (1:master@Tremblay) Send a message to worker-3 +> [ 28.206186] (1:master@Tremblay) Send to worker-3 completed +> [ 28.206186] (1:master@Tremblay) Send a message to worker-4 +> [ 28.206186] (1:master@Tremblay) Mmh. The communication with 'worker-4' failed. Nevermind. Let's keep going! +> [ 28.206186] (1:master@Tremblay) Send a message to worker-0 +> [ 28.206186] (4:worker@Ginette) Start execution... +> [ 28.206186] (5:worker@Bourassa) Mmh. Something went wrong. Nevermind. Let's keep going! +> [ 28.206186] (5:worker@Bourassa) Waiting a message on worker-4 +> [ 28.216495] (1:master@Tremblay) Send to worker-0 completed +> [ 28.216495] (1:master@Tremblay) Send a message to worker-1 +> [ 28.216495] (2:worker@Tremblay) Start execution... +> [ 29.247423] (1:master@Tremblay) Send to worker-1 completed +> [ 29.247423] (1:master@Tremblay) Send a message to worker-2 +> [ 29.247423] (7:worker@Jupiter) Start execution... +> [ 30.206186] (4:worker@Ginette) Execution complete. +> [ 30.206186] (4:worker@Ginette) Waiting a message on worker-3 +> [ 30.216495] (2:worker@Tremblay) Execution complete. +> [ 30.216495] (2:worker@Tremblay) Waiting a message on worker-0 +> [ 31.247423] (7:worker@Jupiter) Execution complete. +> [ 31.247423] (7:worker@Jupiter) Waiting a message on worker-1 +> [ 39.247423] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! +> [ 39.247423] (1:master@Tremblay) Send a message to worker-3 +> [ 40.278351] (1:master@Tremblay) Send to worker-3 completed +> [ 40.278351] (1:master@Tremblay) Send a message to worker-4 +> [ 40.278351] (4:worker@Ginette) Start execution... +> [ 41.000000] (4:worker@Ginette) Gloups. The cpu on which I'm running just turned off!. See you! +> [ 41.309278] (1:master@Tremblay) Send to worker-4 completed +> [ 41.309278] (1:master@Tremblay) All tasks have been dispatched. Let's tell everybody the computation is over. +> [ 41.309278] (2:worker@Tremblay) I'm done. See you! +> [ 41.309278] (5:worker@Bourassa) Start execution... +> [ 41.309278] (7:worker@Jupiter) I'm done. See you! +> [ 42.309278] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! +> [ 43.309278] (0:maestro@) Simulation time 43.3093 +> [ 43.309278] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-3'. Nevermind. Let's keep going! +> [ 43.309278] (1:master@Tremblay) Goodbye now! +> [ 43.309278] (5:worker@Bourassa) Execution complete. +> [ 43.309278] (5:worker@Bourassa) Waiting a message on worker-4 +> [ 43.309278] (5:worker@Bourassa) I'm done. See you! +p Testing a simple master/worker example application handling failures. TCP crosstraffic ENABLED + +! output sort 19 +$ $SG_TEST_EXENV ${bindir:=.}/s4u-platform-failures$EXEEXT --log=xbt_cfg.thres:critical --log=no_loc ${platfdir}/small_platform_failures.xml ${srcdir:=.}/s4u-platform-failures_d.xml --cfg=path:${srcdir} "--log=root.fmt:[%10.6r]%e(%i:%P@%h)%e%m%n" --log=surf_cpu.t:verbose +> [ 0.000000] (0:maestro@) Cannot launch actor 'worker' on failed host 'Fafard' +> [ 0.000000] (0:maestro@) Deployment includes some initially turned off Hosts ... nevermind. +> [ 0.000000] (1:master@Tremblay) Got 5 workers and 20 tasks to process +> [ 0.000000] (1:master@Tremblay) Send a message to worker-0 +> [ 0.000000] (2:worker@Tremblay) Waiting a message on worker-0 +> [ 0.000000] (3:worker@Jupiter) Waiting a message on worker-1 +> [ 0.000000] (4:worker@Ginette) Waiting a message on worker-3 +> [ 0.000000] (5:worker@Bourassa) Waiting a message on worker-4 +> [ 0.010825] (2:worker@Tremblay) Start execution... +> [ 0.010825] (1:master@Tremblay) Send to worker-0 completed +> [ 0.010825] (1:master@Tremblay) Send a message to worker-1 +> [ 1.000000] (0:maestro@) Restart processes on host Fafard +> [ 1.000000] (6:worker@Fafard) Waiting a message on worker-2 +> [ 1.000000] (1:master@Tremblay) Mmh. The communication with 'worker-1' failed. Nevermind. Let's keep going! +> [ 1.000000] (1:master@Tremblay) Send a message to worker-2 +> [ 1.000000] (3:worker@Jupiter) Gloups. The cpu on which I'm running just turned off!. See you! +> [ 2.000000] (0:maestro@) Restart processes on host Jupiter +> [ 2.000000] (7:worker@Jupiter) Waiting a message on worker-1 +> [ 2.000000] (1:master@Tremblay) Mmh. The communication with 'worker-2' failed. Nevermind. Let's keep going! +> [ 2.000000] (1:master@Tremblay) Send a message to worker-3 +> [ 2.000000] (6:worker@Fafard) Gloups. The cpu on which I'm running just turned off!. See you! +> [ 2.010825] (2:worker@Tremblay) Execution complete. +> [ 2.010825] (2:worker@Tremblay) Waiting a message on worker-0 +> [ 3.082474] (4:worker@Ginette) Start execution... +> [ 3.082474] (1:master@Tremblay) Send to worker-3 completed +> [ 3.082474] (1:master@Tremblay) Send a message to worker-4 +> [ 4.164948] (5:worker@Bourassa) Start execution... +> [ 4.164948] (1:master@Tremblay) Send to worker-4 completed +> [ 4.164948] (1:master@Tremblay) Send a message to worker-0 +> [ 4.175773] (2:worker@Tremblay) Start execution... +> [ 4.175773] (1:master@Tremblay) Send to worker-0 completed +> [ 4.175773] (1:master@Tremblay) Send a message to worker-1 +> [ 5.082474] (4:worker@Ginette) Execution complete. +> [ 5.082474] (4:worker@Ginette) Waiting a message on worker-3 +> [ 5.258247] (7:worker@Jupiter) Start execution... +> [ 5.258247] (1:master@Tremblay) Send to worker-1 completed +> [ 5.258247] (1:master@Tremblay) Send a message to worker-2 +> [ 6.164948] (5:worker@Bourassa) Execution complete. +> [ 6.164948] (5:worker@Bourassa) Waiting a message on worker-4 +> [ 6.175773] (2:worker@Tremblay) Execution complete. +> [ 6.175773] (2:worker@Tremblay) Waiting a message on worker-0 +> [ 7.258247] (7:worker@Jupiter) Execution complete. +> [ 7.258247] (7:worker@Jupiter) Waiting a message on worker-1 +> [ 15.258247] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! +> [ 15.258247] (1:master@Tremblay) Send a message to worker-3 +> [ 15.258247] (4:worker@Ginette) Mmh. Something went wrong. Nevermind. Let's keep going! +> [ 15.258247] (4:worker@Ginette) Waiting a message on worker-3 +> [ 15.258247] (1:master@Tremblay) Mmh. The communication with 'worker-3' failed. Nevermind. Let's keep going! +> [ 15.258247] (1:master@Tremblay) Send a message to worker-4 +> [ 16.340722] (5:worker@Bourassa) Start execution... +> [ 16.340722] (1:master@Tremblay) Send to worker-4 completed +> [ 16.340722] (1:master@Tremblay) Send a message to worker-0 +> [ 16.351546] (2:worker@Tremblay) Start execution... +> [ 16.351546] (1:master@Tremblay) Send to worker-0 completed +> [ 16.351546] (1:master@Tremblay) Send a message to worker-1 +> [ 17.434021] (7:worker@Jupiter) Start execution... +> [ 17.434021] (1:master@Tremblay) Send to worker-1 completed +> [ 17.434021] (1:master@Tremblay) Send a message to worker-2 +> [ 18.340722] (5:worker@Bourassa) Execution complete. +> [ 18.340722] (5:worker@Bourassa) Waiting a message on worker-4 +> [ 18.351546] (2:worker@Tremblay) Execution complete. +> [ 18.351546] (2:worker@Tremblay) Waiting a message on worker-0 +> [ 19.434021] (7:worker@Jupiter) Execution complete. +> [ 19.434021] (7:worker@Jupiter) Waiting a message on worker-1 +> [ 27.434021] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! +> [ 27.434021] (1:master@Tremblay) Send a message to worker-3 +> [ 28.516495] (4:worker@Ginette) Start execution... +> [ 28.516495] (1:master@Tremblay) Send to worker-3 completed +> [ 28.516495] (1:master@Tremblay) Send a message to worker-4 +> [ 28.516495] (5:worker@Bourassa) Mmh. Something went wrong. Nevermind. Let's keep going! +> [ 28.516495] (5:worker@Bourassa) Waiting a message on worker-4 +> [ 28.516495] (1:master@Tremblay) Mmh. The communication with 'worker-4' failed. Nevermind. Let's keep going! +> [ 28.516495] (1:master@Tremblay) Send a message to worker-0 +> [ 28.527320] (2:worker@Tremblay) Start execution... +> [ 28.527320] (1:master@Tremblay) Send to worker-0 completed +> [ 28.527320] (1:master@Tremblay) Send a message to worker-1 +> [ 29.609794] (7:worker@Jupiter) Start execution... +> [ 29.609794] (1:master@Tremblay) Send to worker-1 completed +> [ 29.609794] (1:master@Tremblay) Send a message to worker-2 +> [ 30.516495] (4:worker@Ginette) Execution complete. +> [ 30.516495] (4:worker@Ginette) Waiting a message on worker-3 +> [ 30.527320] (2:worker@Tremblay) Execution complete. +> [ 30.527320] (2:worker@Tremblay) Waiting a message on worker-0 +> [ 31.609794] (7:worker@Jupiter) Execution complete. +> [ 31.609794] (7:worker@Jupiter) Waiting a message on worker-1 +> [ 39.609794] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! +> [ 39.609794] (1:master@Tremblay) Send a message to worker-3 +> [ 40.692268] (4:worker@Ginette) Start execution... +> [ 40.692268] (1:master@Tremblay) Send to worker-3 completed +> [ 40.692268] (1:master@Tremblay) Send a message to worker-4 +> [ 41.000000] (4:worker@Ginette) Gloups. The cpu on which I'm running just turned off!. See you! +> [ 41.774742] (5:worker@Bourassa) Start execution... +> [ 41.774742] (1:master@Tremblay) Send to worker-4 completed +> [ 41.774742] (1:master@Tremblay) All tasks have been dispatched. Let's tell everybody the computation is over. +> [ 41.774742] (2:worker@Tremblay) I'm done. See you! +> [ 41.774742] (7:worker@Jupiter) I'm done. See you! +> [ 42.774742] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-2'. Nevermind. Let's keep going! +> [ 43.774742] (5:worker@Bourassa) Execution complete. +> [ 43.774742] (5:worker@Bourassa) Waiting a message on worker-4 +> [ 43.774742] (1:master@Tremblay) Mmh. Got timeouted while speaking to 'worker-3'. Nevermind. Let's keep going! +> [ 43.774742] (5:worker@Bourassa) I'm done. See you! +> [ 43.774742] (1:master@Tremblay) Goodbye now! +> [ 43.774742] (0:maestro@) Simulation time 43.7747