From 9d2bedb765159fa0135d334f703c85fb2384a859 Mon Sep 17 00:00:00 2001
From: David Laiymani <laiymani@pearljam.iut-bm.univ-fcomte.fr>
Date: Thu, 6 Jan 2011 17:16:10 +0100
Subject: [PATCH] 1er draft complet. A relire of course. Manque la biblio

---
 gpc2011.tex | 193 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 115 insertions(+), 78 deletions(-)

diff --git a/gpc2011.tex b/gpc2011.tex
index a29bb6f..eaf7101 100644
--- a/gpc2011.tex
+++ b/gpc2011.tex
@@ -73,7 +73,8 @@ Laboratoire d'Informatique de l'universit\'{e}
   we choose the architectural context of global (or volunteer) computing.
   For this, we used the XtremWeb-CH environement. Experiments were
   conducted on a real global computing testbed and show good speed-ups
-  and very acceptable platform overhead.  
+  and very acceptable platform overhead letting XtremWeb-CH a good candidate
+for deploying parallel applications over a global computing environment. 
 \end{abstract}
 
 
@@ -226,89 +227,115 @@ differences observed at the borders are no longer relevant.
 Nonetheless, in order to preserve the performance of the parallel
 algorithm, it is important to carefully set the overlapping ratio
 $\alpha$. It must be large enough to avoid the border's errors, and
-as small as possible to limit the size increase of the data subsets.
+as small as possible to limit the size increase of the data subsets
+(Qu'en est-il pour nos test ?).
 
 
 
 \section{The XtremWeb-CH environment}
 \input{xwch.tex}
 
-\section{}
+\section{The Neurad gridification}
 
 \label{sec:neurad_gridif}
 
 
-The Neurad application can be divided into three parts. The first one
-aims at dividing data representing dose distribution on an area. This
-area contains various parameters, like the density of the medium and
-its nature. Multiple ``views'' can be superposed in order to obtain a
-more accurate learning. The second part of the application is the
-learning itself. This is the most time consuming part and therefore
-this is the one which has been ported to XWCH. This part fits well
-with the model of the middleware -- all learning tasks execute in
+As previously exposed, the Neurad application can be divided into
+three steps.  The goal of the first step is to decompose the data
+representing the dose distribution on an area. This area contains
+various parameters, like the nature of the medium and its
+density. This part is out of the scope of this paper.
+%Multiple ``views'' can be
+%superposed in order to obtain a more accurate learning. 
+
+The second step of the application, and the most time consuming, is
+the learning itself. This is the one which has been parallelized,
+using the XWCH environment. As exposed in the section 2, the
+parallelization relies on a partitionning of the global
+dataset. Following this partitionning all learning tasks execute in
 parallel independently with their own local data part, with no
-communication, following the fork-join model. As described on Figure
-\ref{fig:neurad_grid}, we first send the learning application and data
-to the middleware (more precisely on warehouses (DW)) and create the
-computation module. When a worker (W) is ready to compute, it requests
-a task to execute to the coordinator (Coord.). This latter assigns it
-a task. The worker retrieves the application and its assigned data,
-and can start the computation. At the end of the learning process, it
-sends the result, a weighted neural network which will be used in a
-dose distribution process, to a warehouse. The last step of the
-application is to retrieve these results and exploit them.
+communication, following the fork-join model. Clearly, this
+computation fits well with the model of the chosen middleware.
+
+The execution scheme is then the following (see Figure
+\ref{fig:neurad_grid}):
+\begin{enumerate}
+\item we first send the learning application and its data to the
+  middleware (more precisely on warehouses (DW)) and create the
+  computation module,
+\item when a worker (W) is ready to compute, it requests a task to
+  execute to the coordinator (Coord.),
+\item The coordinator assigns the worker a task. This last one retrieves the
+application and its assigned data and so can start the computation. 
+\item At the end of the learning process, the worker sends the result,, to a warehouse.
+\end{enumerate}
+
+The last step of the application is to retrieve these results (some
+weighted neural networks) and exploit them through a dose distribution
+process.
 
 
 \begin{figure}[ht]
   \centering
-  \includegraphics[width=\linewidth]{neurad_gridif}
-  \caption{Neurad gridification}
+  \includegraphics[width=8cm]{figures/neurad_gridif}
+  \caption{The proposed Neurad gridification}
   \label{fig:neurad_grid}
 \end{figure}
 
 \section{Experimental results}
-
 \label{sec:neurad_xp}
 
-\subsubsection{Conditions}
-\label{sec:neurad_cond}
+The aim of this section is to describe and analyse the experimental
+results we have obtained with the parallel Neurad version previously
+described. Our goal was to carry out this application with real input
+data and on a real global computing testbed.
 
+\subsubsection{Experimental conditions}
+\label{sec:neurad_cond}
 
-The evaluation of the execution of the Neurad application on XWCH was
-composed as follows. The size of the input data is about 2.4Gb. This
-amount of data can be divided into 25 parts â otherwise, data noise
-appears and will disturb the learning. We have used 25 computers (XWCH
-workers) to execute this part of the application. This generates input
-data parts of about 15Mb (in a compressed format). The output data,
-which are retrieved after the process, are about 30Kb for each part. We
-used two distincts deployments of XWCH. In the first one, the XWCH
-coordinator and the warehouses were situated in Geneva, Switzerland
-while the workers were running in the same local cluster in Belfort,
-France. The second deployment is a local deployment where both
-coordinator, warehouses and workers were in the same local cluster.
-During the day these machines were used by students of the Computer
-Science Department of the IUT of Belfort.
-
-We have furthermore compared the execution of the Neurad application
-with and without the XWCH platform in order to measure the overhead
-induced by the use of the platform. By "without XWCH" we mean that the
-testbed consists only in workers deployed with their respective data by
-the use of shell scripts. No specific middleware was used and the
+The size of the input data is about 2.4Gb. In order to avoid that data
+noise appears and disturb the learning process, these data can be
+divided into 25 part, at most. This generates input data parts of
+about 15Mb (in a compressed format). The output data, which are
+retrieved after the process, are about 30Kb for each
+part. Unfortunately, the data decomposition limitation does not allow
+us to use more than 25 computers (XWCH workers). Nevertheless, we used two
+distincts deployments of XWCH:
+\begin{enumerate} 
+
+\item In the first one, called ``ditributed XWCH'' in the following,
+  the XWCH coordinator and the warehouses were situated in Geneva,
+  Switzerland while the workers were running in the same local cluster
+  in Belfort, France.
+
+\item The second deployment, called ``local XWCH'' is a local
+  deployment where both coordinator, warehouses and workers were in
+  the same local cluster.  
+
+\end{enumerate}
+For the both deployments, during the day these machines were used by
+students of the Computer Science Department of the IUT of Belfort.
+
+In order to evaluate the overhead induced by the use of the platform
+we have furthermore compared the execution of the Neurad application
+with and without the XWCH platform. For the latter case, we mean that the
+testbed consists only in workers deployed with their respective data
+by the use of shell scripts. No specific middleware was used and the
 workers were in the same local cluster.
 
-Five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$, $0.50e^{-1}$, $0.25e^{-1}$ and $1e^{-2}$.
+Finally, five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$,
+$0.50e^{-1}$, $0.25e^{-1}$ and $1e^{-2}$.
 
 
 \subsubsection{Results}
 \label{sec:neurad_result}
 
-
-In these experiments, we measured the same steps on both kinds of
-executions. The steps consist of sending of local data and the
-executable, the learning process, and retrieving the result. Table
-\ref{tab:neurad_res} presents the execution times of the Neurad
-application on 25 machines with XWCH (local and distributed deployment)
-and without XWCH.
+Table \ref{tab:neurad_res} presents the execution times of the Neurad
+application on 25 machines with XWCH (local and distributed
+deployment) and without XWCH. These results correspond to the measure
+of the same step for both kind of execution i.e. sending of local data and the
+executable, the learning process, and retrieving the results. The
+results represent the average time of $x$ executions.
 
 
 \begin{table}[h!]
@@ -342,36 +369,46 @@ and without XWCH.
 %\end{table}
 
 
-These experiments show that the overhead induced by the use of the XWCH
-platform is about $34\%$ in the distributed deployment and about $7\%$
-in the local deployment. For this last one, the overhead is very acceptable regarding to the benefits of the platform.
-
-Now, in the distributed deployment the overhead is also acceptable and can be explained by
-different factors. First, we point out that the conditions of executions
-are not really identical between with and without XWCH. For this last
-one, though the same steps were done, all transfer processes are inside
-a local cluster with a high bandwidth and a low latency. Whereas when
-using XWCH, all transfer processes (between datawarehouses, workers, and
-the coordinator) used a wide network area with a smaller bandwidth.
+As we can see, in the case of a local deployment the overhead induced
+by the use of the XWCH platform is about $7\%$. It is clearly a low
+overhead. Now, for the distributed deployment, the overhead is about
+$34\%$. Regarding to the benefits of the platform, it is a very
+acceptable overhead which can be explained by the following points.
 
-In addition, in executions without XWCH, all the machines started
-immediately the computation, whereas when using the XWCH platform, a
-latency is introduced by the fact that a task starts on a machine, only
-when this one requests a task.
-
-These experiments underline that deploying a local coordinator and one
-or more warehouses near a cluster of workers can enhance computations
-and platform performances. They also show a limited overhead due to the
-use of the platform.
-
-
-\end{document}
+First, we point out that the conditions of executions are not really
+identical between with and without XWCH contexts. For this last one,
+though the same steps were done, all transfer processes are inside a
+local cluster with a high bandwidth and a low latency. Whereas when
+using XWCH, all transfer processes (between datawarehouses, workers,
+and the coordinator) used a wide network area with a smaller
+bandwidth.  In addition, in executions without XWCH, all the machines
+started immediately the computation, whereas when using the XWCH
+platform, a latency is introduced by the fact that a computation
+starts on a machine, only when this one requests a task.
 
+This underline that, unsurprisingly, deploying a local
+coordinator and one or more warehouses near a cluster of workers can
+enhance computations and platform performances. 
 
 
 \section{Conclusion and future works}
 
-
+In this paper, we have presented a gridification of a real medical
+application, the Neurad application. This radiotherapy application
+tries to optimize the irradiated dose distribution within a
+patient. Based on a multi-layer neural network, this applications
+present a very time consuming step i.e. the learning step. Due to the
+computing characteristics of this step, we choose to parallelize it
+using the XtremWeb-CH global computing environment. Obtained
+experimental results show good speed-ups and underline that overheads
+induced by XWCH are very acceptable, letting it be a good candidate
+for deploying parallel applications over a global computing environment.
+
+Our future works, include the testing of the application on a more
+large scale testbed. This implies, the choice of a data input set
+allowing a finer decomposition. Unfortunately, this choice of input
+data is not trivial and relies on a large number of parameters
+(demander ici des prÃ©cisions Ã  Marc).
 
 \bibliographystyle{plain}
 \bibliography{biblio}
-- 
2.20.1