\title{Gridification of a Radiotherapy Dose Computation Application with the XtremWeb-CH Environment}
-\author{Nabil Abdennhader\inst{1} \and Mohamed Ben Belgacem{1} \and Raphaël Couturier\inst{2} \and
- David Laiymani\inst{2} \and Sébastien Miquée\inst{2} \and Marko Niinimaki\inst{1} \and Marc Sauget\inst{2}}
+\author{Nabil Abdennadher\inst{1} \and Mohamed Ben Belgacem\inst{1} \and Raphaël Couturier\inst{2} \and
+ David Laiymani\inst{2} \and Sébastien Miquée\inst{2} \and Marko Niinimaki\inst{1} \and Marc Sauget\inst{3}}
\institute{
University of Applied Sciences Western Switzerland, hepia Geneva,
Switzerland \\
-\email{nabil.abdennadher@hesge.ch, mohamed.benbelgacem@unige.ch, markopekka.niinimaeki@hesge.ch}
+\email{nabil.abdennadher@hesge.ch,mohamed.benbelgacem@unige.ch,markopekka.niinimaeki@hesge.ch}
\and
Laboratoire d'Informatique de l'universit\'{e}
de Franche-Comt\'{e} \\
IUT Belfort-Montbéliard, Rue Engel Gros, 90016 Belfort - France \\
-\email{raphael.couturier, david.laiymani, sebastien.miquee@univ-fcomte.fr}
+\email{\{raphael.couturier,david.laiymani,sebastien.miquee\}@univ-fcomte.fr}
\and
FEMTO-ST, ENISYS/IRMA, F-25210 Montb\'{e}liard , FRANCE\\
-\email{marc.sauget@femtost.fr}
+\email{marc.sauget@univ-fcomte.fr}
}
This paper presents the design and the evaluation of the
gridification of a radiotherapy dose computation application. Due to
the inherent characteristics of the application and its execution,
- we choose the architectural context of global (or volunteer) computing.
- For this, we used the XtremWeb-CH environement. Experiments were
- conducted on a real global computing testbed and show good speed-ups
- and very acceptable platform overhead.
+ we choose the architectural context of volunteer
+ computing. For this, we used the XtremWeb-CH
+ environment. Experiments were conducted on a real volunteer computing
+ testbed and show good speed-ups and very acceptable platform
+ overhead, letting XtremWeb-CH be a good candidate for deploying
+ parallel applications over a volunteer computing environment.
\end{abstract}
found between the precision and the speed of calculation. Current
techniques, using analytic methods, models and databases, are rapid
but lack precision. Enhanced precision can be achieved by using
-calculation codes based, for example, on Monte Carlo methods. The main
-drawback of these methods is their computation times which can be
-rapidly be huge. In [] the authors proposed a novel approach, called
+calculation codes based, for example, on the Monte Carlo methods. The main
+drawback of these methods is their computation times which can
+rapidly become huge. In \cite{NIMB2008} the authors proposed a new approach, called
Neurad, using neural networks. This approach is based on the
collaboration of computation codes and multi-layer neural networks
used as universal approximators. It provides a fast and accurate
evaluation of radiation doses in any given environment for given
irradiation parameters. As the learning step is often very time
-consuming, in \cite{bcvsv08:ip} the authors proposed a parallel
-algorithm that enable to decompose the learning domain into
-subdomains. The decomposition has the advantage to significantly
-reduce the complexity of the target functions to approximate.
+consuming, in \cite{AES2009} the authors proposed a parallel
+algorithm that enables to decompose the learning domain into
+subdomains. The decomposition has the advantage of significantly
+reducing the complexity of the target functions to approximate.
Now, as there exist several classes of distributed/parallel
-architectures (supercomputers, clusters, global computing...) we have
-to choose the best suited one for the parallel Neurad application.
-The Global or Volunteer Computing model seems to be an interesting
-approach. Here, the computing power is obtained by aggregating unused
-(or volunteer) public resources connected to the Internet. For our
-case, we can imagine for example, that a part of the architecture will
-be composed of some of the different computers of the hospital. This
-approach present the advantage to be clearly cheaper than a more
-dedicated approach like the use of supercomputers or clusters.
+architectures (supercomputers, clusters, global computing\dots{}) we
+have to choose the best suited one for the parallel Neurad
+application. The volunteer (or global) computing model seems to be an
+interesting approach. Here, the computing power is obtained by
+aggregating unused (or volunteer) public resources connected to the
+Internet. In our case, we can imagine, for example, that a part of the
+architecture will be composed of some of the different computers of
+the hospital. This approach presents the advantage of being clearly
+cheaper than a more dedicated approach like the use of supercomputers
+or clusters. Furthermore and as we will see in the remainder, the
+studied parallel algorithm corresponds very well to this computation model.
The aim of this paper is to propose and evaluate a gridification of
the Neurad application (more precisely, of the most time consuming
-part, the learning step) using a Global Computing approach. For this,
-we focus on the XtremWeb-CH environment []. We choose this environment
-because it tackles the centralized aspect of other global computing
-environments such as XtremWeb [] or Seti []. It tends to a
-peer-to-peer approach by distributing some components of the
-architecture. For instance, the computing nodes are allowed to
-directly communicate. Experiments were conducted on a real Global
-Computing testbed. The results are very encouraging. They exhibit an
-interesting speed-up and show that the overhead induced by the use of
-XtremWeb-CH is very acceptable.
+part, the learning step) using a volunteer computing approach. For
+this, we focus on the XtremWeb-CH environment\cite{xwch}. We chose
+this environment because it tackles the centralized aspect of other
+global computing environments such as XtremWeb\cite{xtremweb} or
+Seti\cite{seti}. It tends to a peer-to-peer approach by distributing
+some components of the architecture. For instance, the computing nodes
+are allowed to directly communicate. Experiments were conducted on a
+real global computing testbed. The results are very encouraging. They
+exhibit an interesting speed-up and show that the overhead induced by
+the use of XtremWeb-CH is very acceptable.
The paper is organized as follows. In Section 2 we present the Neurad
-application and particularly it most time consuming part i.e. the
+application and particularly its most time consuming part, i.e. the
learning step. Section 3 details the XtremWeb-CH environment and
Section 4 exposes the gridification of the Neurad
application. Experimental results are presented in Section 5 and we
\label{f_neurad}
\end{figure}
-The \emph{Neurad}~\cite{Neurad} project presented in this paper takes
-place in a multi-disciplinary project, involving medical physicists
-and computer scientists whose goal is to enhance the treatment
-planning of cancerous tumors by external radiotherapy. In our
-previous works~\cite{RADIO09,ICANN10,NIMB2008}, we have proposed an
-original approach to solve scientific problems whose accurate modeling
-and/or analytical description are difficult. That method is based on
-the collaboration of computational codes and neural networks used as
-universal interpolator. Thanks to that method, the \emph{Neurad}
-software provides a fast and accurate evaluation of radiation doses in
-any given environment (possibly inhomogeneous) for given irradiation
-parameters. We have shown in a previous work (\cite{AES2009}) the
-interest to use a distributed algorithm for the neural network
-learning. We use a classical RPROP algorithm with a HPU topology to do
-the training of our neural network.
-
-Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts
-are clearly independent: the initial data production, the learning
-process and the dose deposit evaluation. The first step, the data
-production, is outside the {\it{Neurad}} project. They are many
-solutions to obtain data about the radiotherapy treatments like the
-measure or the simulation. The only essential criterion is that the
-result must be obtain in a homogeneous environment.
+The \emph{Neurad}~\cite{Neurad} project presented in this paper takes place in a
+multi-disciplinary project, involving medical physicists and computer scientists
+whose goal is to enhance the treatment planning of cancerous tumors by external
+radiotherapy. In our previous works~\cite{RADIO09,ICANN10,NIMB2008}, we have
+proposed an original approach to solving scientific problems whose accurate
+modeling and/or analytical description are difficult. That method is based on
+the collaboration of computational codes and neural networks used as universal
+interpolator. Thanks to that method, the \emph{Neurad} software provides a fast
+and accurate evaluation of radiation doses in any given environment (possibly
+inhomogeneous) for given irradiation parameters. We have shown in a previous
+work (\cite{AES2009}) the interest of using a distributed algorithm for the neural
+network learning. We use a classical RPROP~\footnote{Resilient backpropagation}
+algorithm with a HPU~\footnote{High order processing units} topology to do the
+training of our neural network.
+
+Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts are clearly
+independent: the initial data production, the learning process and the dose
+deposit evaluation. The first step, the data production, is outside of the
+{\it{Neurad}} project. They are many solutions to obtain data about the
+radiotherapy treatments like the measure or the simulation. The only essential
+criterion is that the result must be obtained in an homogeneous environment.
% We have chosen to
% use only a Monte Carlo simulation because this kind of tool is the
% \label{f_tray}
% \end{figure}
-The secondary stage of the {\it{Neurad}} project is the learning step
-and this is the most time consuming step. This step is off-line but it
-is important to reduce the time used for the learning process to keep
-a workable tool. Indeed, if the learning time is too huge (for the
-moment, this time could reach one week for a limited domain), this
-process should not be launched at any time, but only when a major
-modification occurs in the environment, like a change of context for
-instance. However, it is interesting to update the knowledge of the
-neural network, by using the learning process, when the domain evolves
-(evolution in material used for the prosthesis or evolution on the
-beam (size, shape or energy)). The learning time is related to the
-volume of data who could be very important in a real medical context.
-A work has been done to reduce this learning time with the
-parallelization of the learning process by using a partitioning method
-of the global dataset. The goal of this method is to train many neural
-networks on sub-domains of the global dataset. After this training,
-the use of these neural networks all together allows to obtain a
+The secondary stage of the {\it{Neurad}} project is the learning step and this
+is the most time consuming step. This step is performed offline but it is
+important to reduce the time used for the learning process to keep a workable
+tool. Indeed, if the learning time is too huge (for the moment, this time could
+reach one week for a limited domain), this process should not be launched at any
+time, but only when a major modification occurs in the environment, like a
+change of context for instance. However, it is interesting to update the
+knowledge of the neural network, by using the learning process, when the domain
+evolves (evolution in material used for the prosthesis or evolution on the beam
+(size, shape or energy)). The learning time is related to the volume of data which could be very important in a real medical context. Some work has been done to
+reduce this learning time with the parallelization of the learning process by
+using a partitioning method of the global dataset. The goal of this method is to
+train many neural networks on sub-domains of the global dataset. After this
+training, the use of these neural networks all together allows to obtain a
response for the global domain of study.
\centering
\includegraphics[width=0.5\columnwidth]{figures/overlap.pdf}
\caption{Overlapping for a sub-network in a two-dimensional domain with ratio
- $\alpha$.}
+ $\alpha$}
\label{fig:overlap}
\end{figure}
-
-However, performing the learning on sub-domains constituting a
-partition of the initial domain is not satisfying according to the
-quality of the results. This comes from the fact that the accuracy of
-the approximation performed by a neural network is not constant over
-the learned domain. Thus, it is necessary to use an overlapping of
-the sub-domains. The overall principle is depicted in
-Figure~\ref{fig:overlap}. In this way, each sub-network has an
-exploitation domain smaller than its training domain and the
-differences observed at the borders are no longer relevant.
-Nonetheless, in order to preserve the performance of the parallel
-algorithm, it is important to carefully set the overlapping ratio
-$\alpha$. It must be large enough to avoid the border's errors, and
-as small as possible to limit the size increase of the data subsets.
+% j'ai relu mais pas vu le probleme
+
+However, performing the learning on sub-domains constituting a partition of the
+initial domain may not be satisfying depending on the chosen quality of the results. This
+comes from the fact that the accuracy of the approximation performed by a neural
+network is not constant over the learned domain. Thus, it is necessary to use an
+overlapping of the sub-domains. The overall principle is depicted in
+Figure~\ref{fig:overlap}. In this way, each sub-network has an exploitation
+domain smaller than its training domain and the differences observed at the
+borders are no longer relevant. Nonetheless, in order to preserve the
+performance of the parallel algorithm, it is important to carefully set the
+overlapping ratio $\alpha$. It must both be large enough to avoid the border's
+errors, and as small as possible to limit the size increase of the data
+subsets~\cite{AES2009}.
\section{The XtremWeb-CH environment}
\input{xwch.tex}
-\section{}
+\section{The Neurad gridification}
\label{sec:neurad_gridif}
-The Neurad application can be divided into three parts. The first one
-aims at dividing data representing dose distribution on an area. This
-area contains various parameters, like the density of the medium and
-its nature. Multiple ``views'' can be superposed in order to obtain a
-more accurate learning. The second part of the application is the
-learning itself. This is the most time consuming part and therefore
-this is the one which has been ported to XWCH. This part fits well
-with the model of the middleware -- all learning tasks execute in
-parallel independently with their own local data part, with no
-communication, following the fork-join model. As described on Figure
-\ref{fig:neurad_grid}, we first send the learning application and data
-to the middleware (more precisely on warehouses (DW)) and create the
-computation module. When a worker (W) is ready to compute, it requests
-a task to execute to the coordinator (Coord.). This latter assigns it
-a task. The worker retrieves the application and its assigned data,
-and can start the computation. At the end of the learning process, it
-sends the result, a weighted neural network which will be used in a
-dose distribution process, to a warehouse. The last step of the
-application is to retrieve these results and exploit them.
+As previously exposed, the Neurad application can be divided into
+three steps. The goal of the first step is to decompose the data
+representing the dose distribution on an area. This area contains
+various parameters, like the nature of the medium and its
+density. This part is out of the scope of this paper.
+%Multiple ``views'' can be
+%superposed in order to obtain a more accurate learning.
+The second step of the application, and the most time consuming, is the learning
+in itself. This is the one which has been parallelized, using the XWCH
+environment. As exposed in section 2, the parallelization relies on a
+partitioning of the global dataset. Following this partitioning all learning
+tasks are independently executed in parallel with their own local data part,
+with no communication, following the fork/join model. Clearly, this computation
+fits well with the model of the chosen middleware.
\begin{figure}[ht]
\centering
- \includegraphics[width=\linewidth]{neurad_gridif}
- \caption{Neurad gridification}
+ \includegraphics[width=8cm]{figures/neurad_gridif}
+ \caption{The proposed Neurad gridification}
\label{fig:neurad_grid}
\end{figure}
-\section{Experimental results}
-\label{sec:neurad_xp}
+The execution scheme is then the following (see Figure
+\ref{fig:neurad_grid}):
+\begin{enumerate}
+\item We first send the learning application and its data to the
+ middleware. In a first time, we send the application to data
+ warehouses (DW), and the create an "application module" on the
+ coordinator (Coord.) including references retrieved from the
+ previous sending operation. In a second time, we apply the same
+ process to application data.
+\item When a worker (W) is ready to compute, it requests a task to
+ execute to the coordinator (Coord.);
+\item The coordinator assigns the worker a task. This last one
+ retrieves the application and its assigned data, by requesting them
+ to DW with references sent by the coordinator, and so can start the
+ computation;
+\item At the end of the learning process, the worker sends the result
+ to a warehouse.
+\end{enumerate}
+
+The last step of the application is to retrieve these results (some
+weighted neural networks) and exploit them through a dose distribution
+process. This last step is out of the scope of this paper.
+
-\subsubsection{Conditions}
-\label{sec:neurad_cond}
+\section{Experimental results}
+\label{sec:neurad_xp}
-The evaluation of the execution of the Neurad application on XWCH was
-composed as follows. The size of the input data is about 2.4Gb. This
-amount of data can be divided into 25 parts – otherwise, data noise
-appears and will disturb the learning. We have used 25 computers (XWCH
-workers) to execute this part of the application. This generates input
-data parts of about 15Mb (in a compressed format). The output data,
-which are retrieved after the process, are about 30Kb for each part. We
-used two distincts deployments of XWCH. In the first one, the XWCH
-coordinator and the warehouses were situated in Geneva, Switzerland
-while the workers were running in the same local cluster in Belfort,
-France. The second deployment is a local deployment where both
-coordinator, warehouses and workers were in the same local cluster.
-During the day these machines were used by students of the Computer
-Science Department of the IUT of Belfort.
+The aim of this section is to describe and analyze the experimental
+results we have obtained with the parallel Neurad version previously
+described. Our goal was to carry out this application with real input
+data and on a real volunteer computing testbed.
-We have furthermore compared the execution of the Neurad application
-with and without the XWCH platform in order to measure the overhead
-induced by the use of the platform. By "without XWCH" we mean that the
-testbed consists only in workers deployed with their respective data by
-the use of shell scripts. No specific middleware was used and the
-workers were in the same local cluster.
+\subsubsection{Experimental conditions}
+\label{sec:neurad_cond}
-Five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$, $0.50e^{-1}$, $0.25e^{-1}$ and $1e^{-2}$.
+The size of the input data is about 2.4Gb. In order to avoid
+noises to appear and disturb the learning process, these data can be
+divided into, at most, 25 parts. This generates input data parts of
+about 15Mb (in a compressed format). The output data, which are
+retrieved after the process, are about 30Kb for each part. We used two
+distinct deployments of XWCH:
+\begin{enumerate}
+
+\item In the first one, called ``distributed XWCH'',
+ the XWCH coordinator and the warehouses were located in Geneva,
+ Switzerland while the workers were running in the same local cluster
+ in Belfort, France.
+
+\item The second deployment, called ``local XWCH'' is a local
+ deployment where coordinator, warehouses and workers were, in
+ the same local cluster, at the same time.
+
+\end{enumerate}
+For both deployments, the local cluster is a campus cluster and during
+the day these machines were used by students of the Computer Science
+Department of the IUT of Belfort. Unfortunately, the data
+decomposition limitation does not allow us to use more than 25
+computers (XWCH workers).
+
+In order to evaluate the overhead induced by the use of the platform we have
+furthermore compared the execution of the Neurad application with and without
+the XWCH platform. For the latter case, we want to insist on the fact that the
+testbed consists only in workers deployed with their respective data by the use
+of shell scripts. No specific middleware was used and the workers were in the
+same local cluster.
+
+Finally, five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$,
+$0.50e^{-1}$, $0.25e^{-1}$, and $1e^{-2}$.
\subsubsection{Results}
\label{sec:neurad_result}
-In these experiments, we measured the same steps on both kinds of
-executions. The steps consist of sending of local data and the
-executable, the learning process, and retrieving the result. Table
-\ref{tab:neurad_res} presents the execution times of the Neurad
-application on 25 machines with XWCH (local and distributed deployment)
-and without XWCH.
+Table \ref{tab:neurad_res} presents the execution times of the Neurad
+application on 25 machines with XWCH (local and distributed
+deployment) and without XWCH. These results correspond to the measures
+of the same steps for both kinds of execution, i.e. the sending of local
+data and the executable, the learning process, and retrieving the
+results. Results represent the average time of $5$ executions.
\begin{table}[h!]
+ \renewcommand{\arraystretch}{1.7}
\centering
\begin{tabular}[h!]{|c|c|c|c|c|}
\hline
- Precision & 1 machine & Without XWCH & With XWCH & With local XWCH\\
+ ~Precision~ & ~1 machine~ & ~Without XWCH~ & ~With XWCH~ & ~With
+ local XWCH~ \\
\hline
$1e^{-1}$ & 5190 & 558 & 759 & 629\\
$0.75e^{-1}$ & 6307 & 792 & 1298 & 801 \\
$1e^{-2}$ & 11030 & 1035 & 1447 & 1108 \\
\hline
\end{tabular}
+ \vspace{0.3cm}
\caption{Execution time in seconds of the Neurad application, with and without using the XWCH platform}
\label{tab:neurad_res}
\end{table}
%\end{table}
-These experiments show that the overhead induced by the use of the XWCH
-platform is about $34\%$ in the distributed deployment and about $7\%$
-in the local deployment. For this last one, the overhead is very acceptable regarding to the benefits of the platform.
+As we can see, in the case of a local deployment the overhead induced
+by the use of the XWCH platform is about $7\%$. It is clearly a low
+overhead. Now, for the distributed deployment, the overhead is about
+$34\%$. Regarding the benefits of the platform, it is a very
+acceptable overhead which can be explained by the following points.
-Now, in the distributed deployment the overhead is also acceptable and can be explained by
-different factors. First, we point out that the conditions of executions
-are not really identical between with and without XWCH. For this last
-one, though the same steps were done, all transfer processes are inside
-a local cluster with a high bandwidth and a low latency. Whereas when
-using XWCH, all transfer processes (between datawarehouses, workers, and
-the coordinator) used a wide network area with a smaller bandwidth.
+First, we point out that the conditions of executions are not really
+identical between, with and without, XWCH contexts. For this last one,
+though the same steps were achieved out, all transfer processes are inside a
+local cluster with a high bandwidth and a low latency. Whereas when
+using XWCH, all transfer processes (between datawarehouses, workers,
+and the coordinator) used a wide network area with a smaller
+bandwidth. In addition, in the executions without XWCH, all the machines
+started immediately the computation, whereas when using the XWCH
+platform, a latency is introduced by the fact that a computation
+starts on a machine, only when this one requests a task.
-In addition, in executions without XWCH, all the machines started
-immediately the computation, whereas when using the XWCH platform, a
-latency is introduced by the fact that a task starts on a machine, only
-when this one requests a task.
+This underlines that, unsurprisingly, deploying a local
+coordinator and one or more warehouses near a cluster of workers can
+enhance computations and platform performances.
-These experiments underline that deploying a local coordinator and one
-or more warehouses near a cluster of workers can enhance computations
-and platform performances. They also show a limited overhead due to the
-use of the platform.
-
-
-\end{document}
\section{Conclusion and future works}
+In this paper, we have presented a gridification of a real medical
+application, the Neurad application. This radiotherapy application
+tries to optimize the irradiated dose distribution within a
+patient. Based on a multi-layer neural network, this application
+presents a very time consuming step, i.e. the learning step. Due to the
+computing characteristics of this step, we have chosen to parallelize it
+using the XtremWeb-CH volunteer computing environment. Obtained
+experimental results show good speed-ups and underline that overheads
+induced by XWCH are very acceptable, letting it be a good candidate
+for deploying parallel applications over a volunteer computing environment.
+
+Our future works include the testing of the application on a
+larger scale testbed. This implies, the choice of a data input set
+allowing a finer decomposition. Unfortunately, this choice of input
+data is not trivial and relies on a large number of parameters.
+
+We are also planning to test XWCH with parallel applications where
+communication between workers occurs during the execution. In this
+way, the use of the asynchronous iteration model \cite{bcl08} may be
+an interesting perspective.
+
+%(demander ici des précisions à Marc).
+% Si tu veux parler de l'ensembles des paramètres que l'on peut utiliser pour caractériser les conditions d'irradiations
+% tu peux parler :
+% - caracteristiques du faisceaux d'irradiation (beam size (de quelques mm à plus de 40 cm), energy, SSD (source surface distance),
+% - caractéritiques de la matière : density
+
\bibliographystyle{plain}