destroy task when execution failed because of dead host

[simgrid.git] / TODO
diff --git a/TODO b/TODO

index 98359e5..3d4a9fe 100644 (file)
--- a/TODO
+++ b/TODO
@@ -1,205 +1,61 @@
+
  ###
-### Ongoing stuff
+### Urgent stuff:
  ###
  
+* Have a proper todo file
+
+
+
+
+
+
+
+
+                ************************************************
+                ***  This file is a TODO. It is thus kinda   ***
+                ***  outdated. You know the story, right?    ***
+                ************************************************
+
+
+
  
  ###
-### Planned
+### Ongoing stuff
  ###
  
-*
-* Infrastructure
-****************
+Document host module
  
-[autoconf]
-  * Check the gcc version on powerpc. We disabled -floop-optimize on powerpc,
-    but versions above 3.4.0 should be ok.
-  * check whether we have better than jmp_buf to implement exceptions, and
-    use it (may need to generate a public .h, as glib does)
+/* FIXME: better place? */
+int vasprintf  (char **ptr, const char *fmt, va_list ap);
+char *bprintf(const char*fmt, ...) _XBT_GNUC_PRINTF(1,2);
+
+###
+### Planned
+###
  
  *
  * XBT
  *****
  
-[doc]
-  * graphic showing:
-    (errors, logs ; dynars, dicts, hooks, pools; config, rrdb)
-
-[portability layer]
-  * Mallocators and/or memory pool so that we can cleanly kill an actor
-
  [errors/exception]
-  * Better split casual errors from programing errors.
-    The first ones should be repported to the user, the second should kill
+  * Better split casual errors from programming errors.
+    The first ones should be reported to the user, the second should kill
      the program (or, yet better, only the msg handler)
    * Allows the use of an error handler depending on the current module (ie,
-    the same philosophy than log4c using GSL's error functions)
+    the same philosophy as log4c using GSL's error functions)
  
  [logs]
    * Hijack message from a given category to another for a while (to mask
      initializations, and more)
    * Allow each actor to have its own setting
-  * a init/exit mecanism for logging appender
-  * Several appenders; fix the setting stuff to change the appender
    * more logging appenders (take those from Ralf in l2)
  
-[dict]
-  * speed up the cursors, for example using the contexts when available
-
  [modules]
-  * better formalisation of what modules are (amok deeply needs it)
-    configuration + init() + exit() + dependencies
-  * allow to load them at runtime
-    check in erlang how they upgrade them without downtime
+  * Add configuration and dependencies to our module definition
  
  [other modules]
    * we may need a round-robin database module, and a statistical one
-  * a hook module *may* help cleaning up some parts. Not sure yet.
    * Some of the datacontainer modules seem to overlap. Kill some of them?
-
-*
-* GRAS
-******
-
-[doc]
-  * add the token ring as official example
-  * implement the P2P protocols that macedon does. They constitute great
-    examples, too
-
-[transport]  
-  * Spawn threads handling the communication
-    - Data sending cannot be delegated if we want to be kept informed
-      (*easily*) of errors here.
-      - Actor execution flow shouldn't be interrupted
-      - It should be allowed to access (both in read and write access) 
-        any data available (ie, referenced) from the actor without 
-        requesting to check for a condition before.
-        (in other word, no mutex or assimilated)
-      - I know that enforcing those rules prevent the implementation of
-        really cleaver stuff. Keeping the stuff simple for the users is more
-        important to me than allowing them to do cleaver tricks. Black magic
-        should be done *within* gras to reach a good performance level.
-
-    - Data receiving can be delegated (and should)
-      The first step here is a "simple" mailbox mecanism, with a fifo of
-        messages protected by semaphore.
-      The rest is rather straightforward too.
-
-  * use poll(2) instead of select(2) when available. (first need to check
-    the advantage of doing so ;)
-
-    Another idea we spoke about was to simulate this feature with a bunch of
-    threads blocked in a read(1) on each incomming socket. The latency is
-    reduced by the cost of a syscall, but the more I think about it, the
-    less I find the idea adapted to our context.
-
-  * timeout the send/recv too (hard to do in RL)
-  * Adaptative timeout
-  * multiplex on incoming SOAP over HTTP (once datadesc can deal with it)
-
-  * The module syntax/API is too complex. 
-    - Everybody opens a server socket (or almost), and nobody open two of
-      them. This should be done automatically without user intervention.
-    - I'd like to offer the possibility to speak to someone, not to speak on
-      a socket. Users shouldn't care about such technical details. 
-    - the idea of host_cookie in NWS seem to match my needs, but we still
-      need a proper name ;)
-    - this would allow to exchange a "socket" between peer :)
-    - the creation needs to identify the peer actor within the process
-
-  * when a send failed because the socket was closed on the other side, 
-    try to reopen it seamlessly. Needs exceptions or another way to
-    differentiate between the several system_error.
-  * cache accepted sockets and close the old ones after a while. 
-    Depends on the previous item; difficult to achieve with firewalls
-
-[datadesc]
-  * Implement gras_datadesc_cpy to speedup things in the simulator
-    (and allow to have several "actors" within the same unix process).
-    For now, we mimick closely the RL even in SG. It was easier to do
-      since the datadesc layer is unchanged, but it is not needed and
-      hinders performance.
-    gras_datadesc_cpy needs to provide the size of the corresponding messages, so
-     that we can report it into the simulator.
-  * Add a XML wire protocol alongside to the binary one (for SOAP/HTTP)
-  * cbps:
-    - Error handling
-    - Regression tests
-  * Inter-arch conversions
-    - Port to ARM
-    - Convert in the same buffer when size increase
-    - Exchange (on net) structures in one shoot when possible.
-    - Port to really exotic platforms (Cray is not IEEE ;)
-  * datadesc_set_cste: give the value by default when receiving. 
-    - It's not transfered anymore, which is good for functions pointer.
-  * Parsing macro
-    - Cleanup the code (bison?)
-    - Factorize code in union/struct field adding
-    - Handle typedefs (needs love from DataDesc/)
-    - Handle unions with annotate
-    - Handle enum
-    - Handle long long and long double
-    - Forbid "char", allow "signed char" and "unsigned char", or user code won't be 
-      portable to ARM, at least.
-    - Handle struct/union/enum embeeded within another container 
-      (needs modifications in DataDesc, too)
-    - Allow sizes for multi-dimensional objects (such as matrices)
- 
-    - Check short a, b;
-    - Check short ***
-    - Check struct { struct { int a } b; } 
-
-  * gras_datadesc_import_nws?
-
-[Messaging]
-  * A proper RPC mecanism
-    - gras_rpctype_declare_v (name,ver, payload_request, payload_answer)
-      (or gras_msgtype_declare_rpc_v). 
-    - Attaching a cb works the same way.
-    - gras_msg_rpc(peer, &request, &answer)
-    - On the wire, a byte indicate the message type:
-      - 0: one-way message (what we have for now)
-      - 1: method call (answer expected; sessionID attached)
-      - 2: successful return (usual datatype attached, with sessionID)
-      - 3: error return (payload = exception)
-      - other message types are possible (forwarding request, group
-        communication)
-  * Message priority
-  * Message forwarding
-  * Group communication
-  * Message declarations in a tree manner (such as log channels)?
-  
-[GRASPE] (platform expender) 
-  * Tool to visualize/deploy and manage in RL
-  * pull method of source diffusion in graspe-slave
-
-[Actors] (parallelism in GRAS)
-  * An actor is a user process. 
-    It has a highly sequential control flow from its birth until its death. 
-    The timers won't stop the current execution to branch elsewhere, they
-    will be delayed until the actor is ready to listen. Likewise, no signal
-    delivery. The goal is to KISS for users.
-  * You can fork a new actor, even on remote hosts. 
-  * They are implemented as threads in RL, but this is still a distributed
-    memory *model*. If you want to share data with another actor, send it
-    using the message interface to explicit who's responsible of this data.
-  * data exchange between actors placed within the same UNIX process is  
-    *implemented* by memcopy, but that's an implementation detail.
-
-[Other, more general issues]
-  * watchdog in RL (ie, while (1) { fork; exec the child, wait in father })
-  * Allow [homogeneous] dico to be sent
-  * Make GRAS thread safe by mutexing what needs to be
-
-*
-* AMOK
-******
-
-[bandwidth]
-  * finish this module (still missing the saturate part)
-  * add a version guessing the appropriate datasizes automatically
-[other modules]
-  * provide a way to retrieve the host load as in NWS
-  * log control, management, dynamic token ring
-  * a way using SSH to ask a remote host to open a socket back on me
- 
+    - replace fifo with dynars
+    - replace set with SWAG