From: mquinson Date: Tue, 16 May 2006 23:14:23 +0000 (+0000) Subject: Those damn sparse messages for networking error conditions drive me nuts X-Git-Tag: v3.3~3127 X-Git-Url: http://info.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/commitdiff_plain/446e8038598b209dd9f9b6e673a0436dea320b61?hp=4af5e3cccdee300382df9aa824a8691f8cdbbda5 Those damn sparse messages for networking error conditions drive me nuts git-svn-id: svn+ssh://scm.gforge.inria.fr/svn/simgrid/simgrid/trunk@2231 48e7efb5-ca39-0410-a469-dd3cf9ba447f --- diff --git a/doc/FAQ.doc b/doc/FAQ.doc index 0aac3fa350..577226445c 100644 --- a/doc/FAQ.doc +++ b/doc/FAQ.doc @@ -881,6 +881,47 @@ These are changes to FleXML itself, not SimGrid. But since we kinda hijacked the development of FleXML, I can grant you that any patches would be really welcome and quickly integrated. +\subsection faq_gras_transport GRAS spits networking error messages + +Gras, on real platforms, naturally use regular sockets to communicate. They +are deeply hiden in the gras abstraction, but when things go wrong, you may +get some weird error messages. Here are some example, with the probable +reason: + + - Transport endpoint is not connected: several processes try to open + a server socket on the same port number of the same machine. This is + naturally bad and each process should pick its own port number for this.\n + Maybe, you just have some processes remaining from a previous experiment + on your machine.\n + Killing them may help, but again if you kill -KILL them, you'll have to + wait for a while: they didn't close there sockets properly and the system + needs a while to notice that this port is free again. + + - Socket closed by remote side: if the remote process is not + supposed to close the socket at this point, it may be dead. + + - Connection reset by peer: I found this on internet about this + error. I think it's what's happening here, too:\n + This basically means that a network error occurred while the client was + receiving data from the server. But what is really happening is that the + server actually accepts the connection, processes the request, and sends + a reply to the client. However, when the server closes the socket, the + client believes that the connection has been terminated abnormally + because the socket implementation sends a TCP reset segment telling the + client to throw away the data and report an error.\n + Sometimes, this problem is caused by not properly closing the + input/output streams and the socket connection. Make sure you close the + input/output streams and socket connection properly. If everything is + closed properly, however, and the problem persists, you can work around + it by adding a one-second sleep before closing the streams and the + socket. This technique, however, is not reliable and may not work on all + systems.\n + Since GRAS sockets are closed properly (repeat after me: there is no bug + in GRAS), it is either that you are closing your sockets on server side + before the client get a chance to read them (use gras_os_sleep() to delay + the server), or the server died awfully before the client got the data. + + \subsection faq_deadlock There is a deadlock !!! Unfortunately, we cannot debug every code written in SimGrid. We