From: Martin Quinson Date: Mon, 10 Dec 2012 11:27:36 +0000 (+0100) Subject: document an ugly pimple in the code that were added for autorestart to work X-Git-Tag: v3_9_rc1~86^2~148 X-Git-Url: http://info.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/commitdiff_plain/3afb7fcd5847dbd9cf2005e28d45eadcf96a399b document an ugly pimple in the code that were added for autorestart to work --- diff --git a/src/surf/surf.c b/src/surf/surf.c index 5309c43bb7..fbf2ae3c71 100644 --- a/src/surf/surf.c +++ b/src/surf/surf.c @@ -608,11 +608,11 @@ double surf_solve(double max_date) min = MAX(next_event_date - NOW, min); } - XBT_DEBUG("Run for NS3 at most %f", min); + XBT_DEBUG("Run for network at most %f", min); // run until min or next flow model_next_action_end = surf_network_model->model_private->share_resources(min); - XBT_DEBUG("Min for NS3 : %f", model_next_action_end); + XBT_DEBUG("Min for network : %f", model_next_action_end); if(model_next_action_end>=0.0) min = model_next_action_end; } @@ -738,6 +738,33 @@ void surf_set_nthreads(int nthreads) { surf_nthreads = nthreads; } +/* This function is a pimple that we ought to fix. But it won't be easy. + * + * The surf_solve() function does properly return the set of actions that changed. + * Instead, each model change a global data, and then the caller of surf_solve must + * pick into these sets of action_failed and action_done. + * + * This was not clean but ok as long as we didn't had to restart the processes when the resource comes back up. + * We worked by putting sentinel actions on every resources we are interested in, + * so that surf informs us if/when the corresponding resource fails. + * + * But this does not work to get Simix informed of when a resource comes back up, and this is where this pimple comes. + * We have a set of resources that are currently down and for which simix needs to know when it comes back up. + * And the current function is called *at every simulation step* to sweep over that set, searching for a resource + * that was turned back up in the meanwhile. This is UGLY and slow. + * + * The proper solution would be to not rely on globals for the action_failed and action_done swags. + * They must be passed as parameter by the caller (the handling of these actions in simix may let you + * think that these two sets can be merged, but their handling in SimDag induce the contrary unless this + * simdag code can check by itself whether the action is done of failed -- seems very doable, but yet more + * cleanup to do). + * + * Once surf_solve() is passed the set of actions that changed, you want to add a new set of resources back up + * as parameter to this function. You also want to add a boolean field "restart_watched" to each resource, and + * make sure that whenever a resource with this field enabled comes back up, it's added to that set so that Simix + * sees it and react accordingly. This would kill that need for surf to call simix. + * + */ void surf_watched_hosts(void) { char *key;