we choose the architectural context of global (or volunteer) computing.
For this, we used the XtremWeb-CH environement. Experiments were
conducted on a real global computing testbed and show good speed-ups
- and very acceptable platform overhead.
+ and very acceptable platform overhead letting XtremWeb-CH a good candidate
+for deploying parallel applications over a global computing environment.
\end{abstract}
Nonetheless, in order to preserve the performance of the parallel
algorithm, it is important to carefully set the overlapping ratio
$\alpha$. It must be large enough to avoid the border's errors, and
-as small as possible to limit the size increase of the data subsets.
+as small as possible to limit the size increase of the data subsets
+(Qu'en est-il pour nos test ?).
\section{The XtremWeb-CH environment}
\input{xwch.tex}
-\section{}
+\section{The Neurad gridification}
\label{sec:neurad_gridif}
-The Neurad application can be divided into three parts. The first one
-aims at dividing data representing dose distribution on an area. This
-area contains various parameters, like the density of the medium and
-its nature. Multiple ``views'' can be superposed in order to obtain a
-more accurate learning. The second part of the application is the
-learning itself. This is the most time consuming part and therefore
-this is the one which has been ported to XWCH. This part fits well
-with the model of the middleware -- all learning tasks execute in
+As previously exposed, the Neurad application can be divided into
+three steps. The goal of the first step is to decompose the data
+representing the dose distribution on an area. This area contains
+various parameters, like the nature of the medium and its
+density. This part is out of the scope of this paper.
+%Multiple ``views'' can be
+%superposed in order to obtain a more accurate learning.
+
+The second step of the application, and the most time consuming, is
+the learning itself. This is the one which has been parallelized,
+using the XWCH environment. As exposed in the section 2, the
+parallelization relies on a partitionning of the global
+dataset. Following this partitionning all learning tasks execute in
parallel independently with their own local data part, with no
-communication, following the fork-join model. As described on Figure
-\ref{fig:neurad_grid}, we first send the learning application and data
-to the middleware (more precisely on warehouses (DW)) and create the
-computation module. When a worker (W) is ready to compute, it requests
-a task to execute to the coordinator (Coord.). This latter assigns it
-a task. The worker retrieves the application and its assigned data,
-and can start the computation. At the end of the learning process, it
-sends the result, a weighted neural network which will be used in a
-dose distribution process, to a warehouse. The last step of the
-application is to retrieve these results and exploit them.
+communication, following the fork-join model. Clearly, this
+computation fits well with the model of the chosen middleware.
+
+The execution scheme is then the following (see Figure
+\ref{fig:neurad_grid}):
+\begin{enumerate}
+\item we first send the learning application and its data to the
+ middleware (more precisely on warehouses (DW)) and create the
+ computation module,
+\item when a worker (W) is ready to compute, it requests a task to
+ execute to the coordinator (Coord.),
+\item The coordinator assigns the worker a task. This last one retrieves the
+application and its assigned data and so can start the computation.
+\item At the end of the learning process, the worker sends the result,, to a warehouse.
+\end{enumerate}
+
+The last step of the application is to retrieve these results (some
+weighted neural networks) and exploit them through a dose distribution
+process.
\begin{figure}[ht]
\centering
- \includegraphics[width=\linewidth]{neurad_gridif}
- \caption{Neurad gridification}
+ \includegraphics[width=8cm]{figures/neurad_gridif}
+ \caption{The proposed Neurad gridification}
\label{fig:neurad_grid}
\end{figure}
\section{Experimental results}
-
\label{sec:neurad_xp}
-\subsubsection{Conditions}
-\label{sec:neurad_cond}
+The aim of this section is to describe and analyse the experimental
+results we have obtained with the parallel Neurad version previously
+described. Our goal was to carry out this application with real input
+data and on a real global computing testbed.
+\subsubsection{Experimental conditions}
+\label{sec:neurad_cond}
-The evaluation of the execution of the Neurad application on XWCH was
-composed as follows. The size of the input data is about 2.4Gb. This
-amount of data can be divided into 25 parts – otherwise, data noise
-appears and will disturb the learning. We have used 25 computers (XWCH
-workers) to execute this part of the application. This generates input
-data parts of about 15Mb (in a compressed format). The output data,
-which are retrieved after the process, are about 30Kb for each part. We
-used two distincts deployments of XWCH. In the first one, the XWCH
-coordinator and the warehouses were situated in Geneva, Switzerland
-while the workers were running in the same local cluster in Belfort,
-France. The second deployment is a local deployment where both
-coordinator, warehouses and workers were in the same local cluster.
-During the day these machines were used by students of the Computer
-Science Department of the IUT of Belfort.
-
-We have furthermore compared the execution of the Neurad application
-with and without the XWCH platform in order to measure the overhead
-induced by the use of the platform. By "without XWCH" we mean that the
-testbed consists only in workers deployed with their respective data by
-the use of shell scripts. No specific middleware was used and the
+The size of the input data is about 2.4Gb. In order to avoid that data
+noise appears and disturb the learning process, these data can be
+divided into 25 part, at most. This generates input data parts of
+about 15Mb (in a compressed format). The output data, which are
+retrieved after the process, are about 30Kb for each
+part. Unfortunately, the data decomposition limitation does not allow
+us to use more than 25 computers (XWCH workers). Nevertheless, we used two
+distincts deployments of XWCH:
+\begin{enumerate}
+
+\item In the first one, called ``ditributed XWCH'' in the following,
+ the XWCH coordinator and the warehouses were situated in Geneva,
+ Switzerland while the workers were running in the same local cluster
+ in Belfort, France.
+
+\item The second deployment, called ``local XWCH'' is a local
+ deployment where both coordinator, warehouses and workers were in
+ the same local cluster.
+
+\end{enumerate}
+For the both deployments, during the day these machines were used by
+students of the Computer Science Department of the IUT of Belfort.
+
+In order to evaluate the overhead induced by the use of the platform
+we have furthermore compared the execution of the Neurad application
+with and without the XWCH platform. For the latter case, we mean that the
+testbed consists only in workers deployed with their respective data
+by the use of shell scripts. No specific middleware was used and the
workers were in the same local cluster.
-Five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$, $0.50e^{-1}$, $0.25e^{-1}$ and $1e^{-2}$.
+Finally, five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$,
+$0.50e^{-1}$, $0.25e^{-1}$ and $1e^{-2}$.
\subsubsection{Results}
\label{sec:neurad_result}
-
-In these experiments, we measured the same steps on both kinds of
-executions. The steps consist of sending of local data and the
-executable, the learning process, and retrieving the result. Table
-\ref{tab:neurad_res} presents the execution times of the Neurad
-application on 25 machines with XWCH (local and distributed deployment)
-and without XWCH.
+Table \ref{tab:neurad_res} presents the execution times of the Neurad
+application on 25 machines with XWCH (local and distributed
+deployment) and without XWCH. These results correspond to the measure
+of the same step for both kind of execution i.e. sending of local data and the
+executable, the learning process, and retrieving the results. The
+results represent the average time of $x$ executions.
\begin{table}[h!]
%\end{table}
-These experiments show that the overhead induced by the use of the XWCH
-platform is about $34\%$ in the distributed deployment and about $7\%$
-in the local deployment. For this last one, the overhead is very acceptable regarding to the benefits of the platform.
-
-Now, in the distributed deployment the overhead is also acceptable and can be explained by
-different factors. First, we point out that the conditions of executions
-are not really identical between with and without XWCH. For this last
-one, though the same steps were done, all transfer processes are inside
-a local cluster with a high bandwidth and a low latency. Whereas when
-using XWCH, all transfer processes (between datawarehouses, workers, and
-the coordinator) used a wide network area with a smaller bandwidth.
+As we can see, in the case of a local deployment the overhead induced
+by the use of the XWCH platform is about $7\%$. It is clearly a low
+overhead. Now, for the distributed deployment, the overhead is about
+$34\%$. Regarding to the benefits of the platform, it is a very
+acceptable overhead which can be explained by the following points.
-In addition, in executions without XWCH, all the machines started
-immediately the computation, whereas when using the XWCH platform, a
-latency is introduced by the fact that a task starts on a machine, only
-when this one requests a task.
-
-These experiments underline that deploying a local coordinator and one
-or more warehouses near a cluster of workers can enhance computations
-and platform performances. They also show a limited overhead due to the
-use of the platform.
-
-
-\end{document}
+First, we point out that the conditions of executions are not really
+identical between with and without XWCH contexts. For this last one,
+though the same steps were done, all transfer processes are inside a
+local cluster with a high bandwidth and a low latency. Whereas when
+using XWCH, all transfer processes (between datawarehouses, workers,
+and the coordinator) used a wide network area with a smaller
+bandwidth. In addition, in executions without XWCH, all the machines
+started immediately the computation, whereas when using the XWCH
+platform, a latency is introduced by the fact that a computation
+starts on a machine, only when this one requests a task.
+This underline that, unsurprisingly, deploying a local
+coordinator and one or more warehouses near a cluster of workers can
+enhance computations and platform performances.
\section{Conclusion and future works}
-
+In this paper, we have presented a gridification of a real medical
+application, the Neurad application. This radiotherapy application
+tries to optimize the irradiated dose distribution within a
+patient. Based on a multi-layer neural network, this applications
+present a very time consuming step i.e. the learning step. Due to the
+computing characteristics of this step, we choose to parallelize it
+using the XtremWeb-CH global computing environment. Obtained
+experimental results show good speed-ups and underline that overheads
+induced by XWCH are very acceptable, letting it be a good candidate
+for deploying parallel applications over a global computing environment.
+
+Our future works, include the testing of the application on a more
+large scale testbed. This implies, the choice of a data input set
+allowing a finer decomposition. Unfortunately, this choice of input
+data is not trivial and relies on a large number of parameters
+(demander ici des précisions à Marc).
\bibliographystyle{plain}
\bibliography{biblio}