found between the precision and the speed of calculation. Current
techniques, using analytic methods, models and databases, are rapid
but lack precision. Enhanced precision can be achieved by using
-calculation codes based, for example, on Monte Carlo methods. The main
-drawback of these methods is their computation times which can be
-rapidly huge. In \cite{NIMB2008} the authors proposed a novel approach, called
+calculation codes based, for example, on the Monte Carlo methods. The main
+drawback of these methods is their computation times which can
+rapidly become huge. In \cite{NIMB2008} the authors proposed a new approach, called
Neurad, using neural networks. This approach is based on the
collaboration of computation codes and multi-layer neural networks
used as universal approximators. It provides a fast and accurate
irradiation parameters. As the learning step is often very time
consuming, in \cite{AES2009} the authors proposed a parallel
algorithm that enables to decompose the learning domain into
-subdomains. The decomposition has the advantage to significantly
-reduce the complexity of the target functions to approximate.
+subdomains. The decomposition has the advantage of significantly
+reducing the complexity of the target functions to approximate.
Now, as there exist several classes of distributed/parallel
-architectures (supercomputers, clusters, global computing...) we have
-to choose the best suited one for the parallel Neurad application.
-The volunteer (or global) computing model seems to be an interesting
-approach. Here, the computing power is obtained by aggregating unused
-(or volunteer) public resources connected to the Internet. For our
-case, we can imagine for example, that a part of the architecture will
-be composed of some of the different computers of the hospital. This
-approach presents the advantage to be clearly cheaper than a more
-dedicated approach like the use of supercomputers or
-clusters. Furthermore and as we will see in the remainder, the studied
-parallel algorithm fits well this computation model.
+architectures (supercomputers, clusters, global computing\dots{}) we
+have to choose the best suited one for the parallel Neurad
+application. The volunteer (or global) computing model seems to be an
+interesting approach. Here, the computing power is obtained by
+aggregating unused (or volunteer) public resources connected to the
+Internet. In our case, we can imagine, for example, that a part of the
+architecture will be composed of some of the different computers of
+the hospital. This approach presents the advantage of being clearly
+cheaper than a more dedicated approach like the use of supercomputers
+or clusters. Furthermore and as we will see in the remainder, the
+studied parallel algorithm corresponds very well to this computation model.
The aim of this paper is to propose and evaluate a gridification of
the Neurad application (more precisely, of the most time consuming
-part, the learning step) using a volunteer computing approach. For this,
-we focus on the XtremWeb-CH environment\cite{}. We choose this environment
-because it tackles the centralized aspect of other global computing
-environments such as XtremWeb\cite{} or Seti\cite{}. It tends to a
-peer-to-peer approach by distributing some components of the
-architecture. For instance, the computing nodes are allowed to
-directly communicate. Experiments were conducted on a real global
-computing testbed. The results are very encouraging. They exhibit an
-interesting speed-up and show that the overhead induced by the use of
-XtremWeb-CH is very acceptable.
+part, the learning step) using a volunteer computing approach. For
+this, we focus on the XtremWeb-CH environment\cite{xwch}. We chose
+this environment because it tackles the centralized aspect of other
+global computing environments such as XtremWeb\cite{xtremweb} or
+Seti\cite{seti}. It tends to a peer-to-peer approach by distributing
+some components of the architecture. For instance, the computing nodes
+are allowed to directly communicate. Experiments were conducted on a
+real global computing testbed. The results are very encouraging. They
+exhibit an interesting speed-up and show that the overhead induced by
+the use of XtremWeb-CH is very acceptable.
The paper is organized as follows. In Section 2 we present the Neurad
application and particularly its most time consuming part, i.e. the
multi-disciplinary project, involving medical physicists and computer scientists
whose goal is to enhance the treatment planning of cancerous tumors by external
radiotherapy. In our previous works~\cite{RADIO09,ICANN10,NIMB2008}, we have
-proposed an original approach to solve scientific problems whose accurate
+proposed an original approach to solving scientific problems whose accurate
modeling and/or analytical description are difficult. That method is based on
the collaboration of computational codes and neural networks used as universal
interpolator. Thanks to that method, the \emph{Neurad} software provides a fast
and accurate evaluation of radiation doses in any given environment (possibly
inhomogeneous) for given irradiation parameters. We have shown in a previous
-work (\cite{AES2009}) the interest to use a distributed algorithm for the neural
+work (\cite{AES2009}) the interest of using a distributed algorithm for the neural
network learning. We use a classical RPROP~\footnote{Resilient backpropagation}
algorithm with a HPU~\footnote{High order processing units} topology to do the
training of our neural network.
% \end{figure}
The secondary stage of the {\it{Neurad}} project is the learning step and this
-is the most time consuming step. This step is performed off-line but it is
+is the most time consuming step. This step is performed offline but it is
important to reduce the time used for the learning process to keep a workable
tool. Indeed, if the learning time is too huge (for the moment, this time could
reach one week for a limited domain), this process should not be launched at any
change of context for instance. However, it is interesting to update the
knowledge of the neural network, by using the learning process, when the domain
evolves (evolution in material used for the prosthesis or evolution on the beam
-(size, shape or energy)). The learning time is related to the volume of data who
-could be very important in a real medical context. A work has been done to
+(size, shape or energy)). The learning time is related to the volume of data which could be very important in a real medical context. Some work has been done to
reduce this learning time with the parallelization of the learning process by
using a partitioning method of the global dataset. The goal of this method is to
train many neural networks on sub-domains of the global dataset. After this
% j'ai relu mais pas vu le probleme
However, performing the learning on sub-domains constituting a partition of the
-initial domain is not satisfying according to the quality of the results. This
+initial domain may not be satisfying depending on the chosen quality of the results. This
comes from the fact that the accuracy of the approximation performed by a neural
network is not constant over the learned domain. Thus, it is necessary to use an
overlapping of the sub-domains. The overall principle is depicted in
domain smaller than its training domain and the differences observed at the
borders are no longer relevant. Nonetheless, in order to preserve the
performance of the parallel algorithm, it is important to carefully set the
-overlapping ratio $\alpha$. It must be large enough to avoid the border's
+overlapping ratio $\alpha$. It must both be large enough to avoid the border's
errors, and as small as possible to limit the size increase of the data
subsets~\cite{AES2009}.
-%(Qu'en est-il pour nos test ?).
-% Ce paramètre a deja été etudié dans un précédent papier, il a donc choisi d'être fixe
-% pour ces tests-ci.
\section{The XtremWeb-CH environment}
%Multiple ``views'' can be
%superposed in order to obtain a more accurate learning.
-The second step of the application, and the most time consuming, is
-the learning itself. This is the one which has been parallelized,
-using the XWCH environment. As exposed in the section 2, the
-parallelization relies on a partitionning of the global
-dataset. Following this partitionning all learning tasks are executed
-in parallel independently with their own local data part, with no
-communication, following the fork/join model. Clearly, this
-computation fits well with the model of the chosen middleware.
+The second step of the application, and the most time consuming, is the learning
+in itself. This is the one which has been parallelized, using the XWCH
+environment. As exposed in section 2, the parallelization relies on a
+partitioning of the global dataset. Following this partitioning all learning
+tasks are independently executed in parallel with their own local data part,
+with no communication, following the fork/join model. Clearly, this computation
+fits well with the model of the chosen middleware.
+
+\begin{figure}[ht]
+ \centering
+ \includegraphics[width=8cm]{figures/neurad_gridif}
+ \caption{The proposed Neurad gridification}
+ \label{fig:neurad_grid}
+\end{figure}
+
The execution scheme is then the following (see Figure
\ref{fig:neurad_grid}):
\begin{enumerate}
\item We first send the learning application and its data to the
- middleware (more precisely on warehouses (DW)) and create the
- computation module;
+ middleware. In a first time, we send the application to data
+ warehouses (DW), and the create an "application module" on the
+ coordinator (Coord.) including references retrieved from the
+ previous sending operation. In a second time, we apply the same
+ process to application data.
\item When a worker (W) is ready to compute, it requests a task to
execute to the coordinator (Coord.);
-\item The coordinator assigns the worker a task. This last one retrieves the
-application and its assigned data and so can start the computation.
-\item At the end of the learning process, the worker sends the result to a warehouse.
+\item The coordinator assigns the worker a task. This last one
+ retrieves the application and its assigned data, by requesting them
+ to DW with references sent by the coordinator, and so can start the
+ computation;
+\item At the end of the learning process, the worker sends the result
+ to a warehouse.
\end{enumerate}
The last step of the application is to retrieve these results (some
weighted neural networks) and exploit them through a dose distribution
-process.
+process. This last step is out of the scope of this paper.
-\begin{figure}[ht]
- \centering
- \includegraphics[width=8cm]{figures/neurad_gridif}
- \caption{The proposed Neurad gridification}
- \label{fig:neurad_grid}
-\end{figure}
\section{Experimental results}
\label{sec:neurad_xp}
\subsubsection{Experimental conditions}
\label{sec:neurad_cond}
-The size of the input data is about 2.4Gb. In order to avoid that
-noise appears and disturbs the learning process, these data can be
+The size of the input data is about 2.4Gb. In order to avoid
+noises to appear and disturb the learning process, these data can be
divided into, at most, 25 parts. This generates input data parts of
about 15Mb (in a compressed format). The output data, which are
retrieved after the process, are about 30Kb for each part. We used two
in Belfort, France.
\item The second deployment, called ``local XWCH'' is a local
- deployment where both coordinator, warehouses and workers were in
- the same local cluster.
+ deployment where coordinator, warehouses and workers were, in
+ the same local cluster, at the same time.
\end{enumerate}
-For both deployments, le local cluster is a campus cluster and during
+For both deployments, the local cluster is a campus cluster and during
the day these machines were used by students of the Computer Science
Department of the IUT of Belfort. Unfortunately, the data
decomposition limitation does not allow us to use more than 25
computers (XWCH workers).
-In order to evaluate the overhead induced by the use of the platform
-we have furthermore compared the execution of the Neurad application
-with and without the XWCH platform. For the latter case, we mean that the
-testbed consists only in workers deployed with their respective data
-by the use of shell scripts. No specific middleware was used and the
-workers were in the same local cluster.
+In order to evaluate the overhead induced by the use of the platform we have
+furthermore compared the execution of the Neurad application with and without
+the XWCH platform. For the latter case, we want to insist on the fact that the
+testbed consists only in workers deployed with their respective data by the use
+of shell scripts. No specific middleware was used and the workers were in the
+same local cluster.
Finally, five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$,
$0.50e^{-1}$, $0.25e^{-1}$, and $1e^{-2}$.
Table \ref{tab:neurad_res} presents the execution times of the Neurad
application on 25 machines with XWCH (local and distributed
deployment) and without XWCH. These results correspond to the measures
-of the same steps for both kinds of execution, i.e. sending of local
+of the same steps for both kinds of execution, i.e. the sending of local
data and the executable, the learning process, and retrieving the
results. Results represent the average time of $5$ executions.
As we can see, in the case of a local deployment the overhead induced
by the use of the XWCH platform is about $7\%$. It is clearly a low
overhead. Now, for the distributed deployment, the overhead is about
-$34\%$. Regarding to the benefits of the platform, it is a very
+$34\%$. Regarding the benefits of the platform, it is a very
acceptable overhead which can be explained by the following points.
First, we point out that the conditions of executions are not really
-identical between with and without XWCH contexts. For this last one,
-though the same steps were done, all transfer processes are inside a
+identical between, with and without, XWCH contexts. For this last one,
+though the same steps were achieved out, all transfer processes are inside a
local cluster with a high bandwidth and a low latency. Whereas when
using XWCH, all transfer processes (between datawarehouses, workers,
and the coordinator) used a wide network area with a smaller
-bandwidth. In addition, in executions without XWCH, all the machines
+bandwidth. In addition, in the executions without XWCH, all the machines
started immediately the computation, whereas when using the XWCH
platform, a latency is introduced by the fact that a computation
starts on a machine, only when this one requests a task.
tries to optimize the irradiated dose distribution within a
patient. Based on a multi-layer neural network, this application
presents a very time consuming step, i.e. the learning step. Due to the
-computing characteristics of this step, we choose to parallelize it
+computing characteristics of this step, we have chosen to parallelize it
using the XtremWeb-CH volunteer computing environment. Obtained
experimental results show good speed-ups and underline that overheads
induced by XWCH are very acceptable, letting it be a good candidate
for deploying parallel applications over a volunteer computing environment.
-Our future works include the testing of the application on a more
-large scale testbed. This implies, the choice of a data input set
+Our future works include the testing of the application on a
+larger scale testbed. This implies, the choice of a data input set
allowing a finer decomposition. Unfortunately, this choice of input
data is not trivial and relies on a large number of parameters.