X-Git-Url: http://info.iut-bm.univ-fcomte.fr/pub/gitweb/gpc2011.git/blobdiff_plain/327afc9fede0d1000ae5402bb80deaa057adeb7c..386389681aed0efc9e9a396a4b09e2ab55a063f4:/gpc2011.tex diff --git a/gpc2011.tex b/gpc2011.tex index 2f091fb..f8cd63b 100644 --- a/gpc2011.tex +++ b/gpc2011.tex @@ -45,168 +45,389 @@ \title{Gridification of a Radiotherapy Dose Computation Application with the XtremWeb-CH Environment} -\author{Nabil Abdennhader\inst{1} \and Raphaël Couturier\inst{1} \and David \and - Julien Henriet\inst{2} \and Laiymani\inst{1} \and Sébastien Miquée\inst{1} - \and Marc Sauget\inst{2}} -\institute{Laboratoire d'Informatique de l'universit\'{e} +\author{Nabil Abdennhader\inst{1} \and Mohamed Ben Belgacem\inst{1} \and Raphaël Couturier\inst{2} \and + David Laiymani\inst{2} \and Sébastien Miquée\inst{2} \and Marko Niinimaki\inst{1} \and Marc Sauget\inst{3}} + +\institute{ +University of Applied Sciences Western Switzerland, hepia Geneva, +Switzerland \\ +\email{nabil.abdennadher@hesge.ch,mohamed.benbelgacem@unige.ch,markopekka.niinimaeki@hesge.ch} +\and +Laboratoire d'Informatique de l'universit\'{e} de Franche-Comt\'{e} \\ IUT Belfort-Montbéliard, Rue Engel Gros, 90016 Belfort - France \\ -\email{raphael.couturier, david.laiymani, sebastien.miquee@univ-fcomte.fr} +\email{\{raphael.couturier,david.laiymani,sebastien.miquee\}@univ-fcomte.fr} \and FEMTO-ST, ENISYS/IRMA, F-25210 Montb\'{e}liard , FRANCE\\ +\email{marc.sauget@univ-fcomte.fr} } -%\email{\texttt{[laiymani]@lifc.univ-fcomte.fr}}} \maketitle \begin{abstract} - + This paper presents the design and the evaluation of the + gridification of a radiotherapy dose computation application. Due to + the inherent characteristics of the application and its execution, + we choose the architectural context of volunteer + computing. For this, we used the XtremWeb-CH + environment. Experiments were conducted on a real volunteer computing + testbed and show good speed-ups and very acceptable platform + overhead, letting XtremWeb-CH be a good candidate for deploying + parallel applications over a volunteer computing environment. \end{abstract} + %-------------INTRODUCTION-------------------- \section{Introduction} -The use of distributed architectures for solving large scientific problems seems -to become mandatory in a lot of cases. For example, in the domain of -radiotherapy dose computation the problem is crucial. The main goal of external -beam radiotherapy is the treatment of tumours while minimizing exposure to -healthy tissue. Dosimetric planning has to be carried out in order to optimize -the dose distribution within the patient is necessary. Thus, for determining the -most accurate dose distribution during treatment planning, a compromise must be -found between the precision and the speed of calculation. Current techniques, -using analytic methods, models and databases, are rapid but lack -precision. Enhanced precision can be achieved by using calculation codes based, -for example, on Monte Carlo methods. In [] the authors proposed a novel approach -based on the use of neural networks. The approach is based on the collaboration -of computation codes and multi-layer neural networks used as universal -approximators. It provides a fast and accurate evaluation of radiation doses in -any given environment for given irradiation parameters. As the learning step is -often very time consumming, in \cite{bcvsv08:ip} the authors proposed a parallel -algorithm that enable to decompose the learning domain into subdomains. The -decomposition has the advantage to significantly reduce the complexity of the -target functions to approximate. - -Now, as there exist several classes of distributed/parallel architectures -(supercomputers, clusters, global computing...) we have to choose the best -suited one for the parallel Neurad application. The Global or Volunteer -computing model seems to be an interesting approach. Here, the computing power -is obtained by agregating unused (or volunteer) public resources connected to -the Internet. For our case, we can imagine for example, that a part of the -architecture will be composed of some of the different computers of the -hospital. This approach present the advantage to be clearly cheaper than a more -dedicated approach like the use of supercomputer or clusters. - -The aim of this paper is to propose and evaluate a gridification of the Neurad -application (more precisely, of the most time consuming part, the learning step) -using a Global computing approach. For this, we focus on the XtremWeb-CH -environnement []. We choose this environnent because it tackles the centralized -aspect of other global computing environments such as XTremWeb [] or Seti []. It -tends to a peer-to-peer approach by distributing some components of the -architecture. For instance, the computing nodes are allowed to directly -communicate. Experimentations were conducted on a real Global Computing -testbed. The results are very encouraging. They exhibit an interesting speed-up -and show that the overhead induced by the use of XTremWeb-CH is very acceptable. - -The paper is organized as follows. In section 2 we present the Neurad -application and particularly it most time consuming part i.e. the learning -step. Section 3 details the XtremWeb-CH environnement while in section 4 we -expose the gridification of the Neurad application. Experimental results are -presented in section 5 and we end in section 6 by some concluding remarks and -perspectives. +The use of distributed architectures for solving large scientific +problems seems to become mandatory in a lot of cases. For example, in +the domain of radiotherapy dose computation the problem is +crucial. The main goal of external beam radiotherapy is the treatment +of tumors while minimizing exposure to healthy tissue. Dosimetric +planning has to be carried out in order to optimize the dose +distribution within the patient. Thus, to determine the most accurate +dose distribution during treatment planning, a compromise must be +found between the precision and the speed of calculation. Current +techniques, using analytic methods, models and databases, are rapid +but lack precision. Enhanced precision can be achieved by using +calculation codes based, for example, on Monte Carlo methods. The main +drawback of these methods is their computation times which can be +rapidly huge. In \cite{NIMB2008} the authors proposed a novel approach, called +Neurad, using neural networks. This approach is based on the +collaboration of computation codes and multi-layer neural networks +used as universal approximators. It provides a fast and accurate +evaluation of radiation doses in any given environment for given +irradiation parameters. As the learning step is often very time +consuming, in \cite{AES2009} the authors proposed a parallel +algorithm that enables to decompose the learning domain into +subdomains. The decomposition has the advantage to significantly +reduce the complexity of the target functions to approximate. + +Now, as there exist several classes of distributed/parallel +architectures (supercomputers, clusters, global computing...) we have +to choose the best suited one for the parallel Neurad application. +The volunteer (or global) computing model seems to be an interesting +approach. Here, the computing power is obtained by aggregating unused +(or volunteer) public resources connected to the Internet. For our +case, we can imagine for example, that a part of the architecture will +be composed of some of the different computers of the hospital. This +approach presents the advantage to be clearly cheaper than a more +dedicated approach like the use of supercomputers or +clusters. Furthermore and as we will see in the remainder, the studied +parallel algorithm fits well this computation model. + +The aim of this paper is to propose and evaluate a gridification of +the Neurad application (more precisely, of the most time consuming +part, the learning step) using a volunteer computing approach. For this, +we focus on the XtremWeb-CH environment\cite{}. We choose this environment +because it tackles the centralized aspect of other global computing +environments such as XtremWeb\cite{} or Seti\cite{}. It tends to a +peer-to-peer approach by distributing some components of the +architecture. For instance, the computing nodes are allowed to +directly communicate. Experiments were conducted on a real global +computing testbed. The results are very encouraging. They exhibit an +interesting speed-up and show that the overhead induced by the use of +XtremWeb-CH is very acceptable. + +The paper is organized as follows. In Section 2 we present the Neurad +application and particularly its most time consuming part, i.e. the +learning step. Section 3 details the XtremWeb-CH environment and +Section 4 exposes the gridification of the Neurad +application. Experimental results are presented in Section 5 and we +end in Section 6 by some concluding remarks and perspectives. \section{The Neurad application} \begin{figure}[http] \centering \includegraphics[width=0.7\columnwidth]{figures/neurad.pdf} - \caption{The Neurad projects} + \caption{The Neurad project} \label{f_neurad} \end{figure} The \emph{Neurad}~\cite{Neurad} project presented in this paper takes place in a -multi-disciplinary project , involving medical physicists and computer -scientists whose goal is to enhance the treatment planning of cancerous tumors -by external radiotherapy. In our previous -works~\cite{RADIO09,ICANN10,NIMB2008}, we have proposed an original approach to -solve scientific problems whose accurate modeling and/or analytical description -are difficult. That method is based on the collaboration of computational codes -and neural networks used as universal interpolator. Thanks to that method, the -\emph{Neurad} software provides a fast and accurate evaluation of radiation -doses in any given environment (possibly inhomogeneous) for given irradiation -parameters. We have shown in a previous work (\cite{AES2009}) the interest to -use a distributed algorithm for the neural network learning. We use a classical -RPROP algorithm with a HPU topology to do the training of our neural network. - -The Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts are -clearly independant : the initial data production, the learning process and the -dose deposit evaluation. The first step, the data production, is outside the -{\it{Neurad}} project. They are many solutions to obtains data about the -radiotherapy treatments like the measure or the simulation. The only essential -criterion is that the result must be obtain in a homogeneous environment. We -have chosen to use only a Monte Carlo simulation because this tools are the -references in the radiotherapy domains. The advantages to use data obtain with a -Monte Carlo simulator are the following : accuracy, profusing, quantify error -and regularity of measure point. But, they are too disagreement and the most -important is the statistical noise forcing a data post treatment. The -Figure~\ref{f_tray} present the general behavior of a dose deposit in water. - - -\begin{figure}[http] - \centering - \includegraphics[width=0.7\columnwidth]{figures/testC.pdf} - \caption{Dose deposit by a photon beam of 24 mm of width in water (Normalized value). } - \label{f_tray} -\end{figure} - -The secondary stage of the {\it{Neurad}} project is about the learning step and -it is the most time consuming step. This step is off-line but is it important to -reduce the time used for the learning process to keep a workable tools. Indeed, -if the learning time is too important (for the moment, this time could reach one -week for a limited works domain), the use of this process could be be limited -only at a major modification of the use context. However, it is interesting to -do an update to the learning process when the bound of the learning domain +multi-disciplinary project, involving medical physicists and computer scientists +whose goal is to enhance the treatment planning of cancerous tumors by external +radiotherapy. In our previous works~\cite{RADIO09,ICANN10,NIMB2008}, we have +proposed an original approach to solve scientific problems whose accurate +modeling and/or analytical description are difficult. That method is based on +the collaboration of computational codes and neural networks used as universal +interpolator. Thanks to that method, the \emph{Neurad} software provides a fast +and accurate evaluation of radiation doses in any given environment (possibly +inhomogeneous) for given irradiation parameters. We have shown in a previous +work (\cite{AES2009}) the interest to use a distributed algorithm for the neural +network learning. We use a classical RPROP~\footnote{Resilient backpropagation} +algorithm with a HPU~\footnote{High order processing units} topology to do the +training of our neural network. + +Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts are clearly +independent: the initial data production, the learning process and the dose +deposit evaluation. The first step, the data production, is outside of the +{\it{Neurad}} project. They are many solutions to obtain data about the +radiotherapy treatments like the measure or the simulation. The only essential +criterion is that the result must be obtained in an homogeneous environment. + +% We have chosen to +% use only a Monte Carlo simulation because this kind of tool is the +% reference in the radiotherapy domains. The advantages to use data +% obtained with a Monte Carlo simulator are the following: accuracy, +% profusion, quantified error and regularity of measure points. But, +% there exist also some disagreements and the most important is the +% statistical noise, forcing a data post treatment. Figure~\ref{f_tray} +% presents the general behavior of a dose deposit in water. + + +% \begin{figure}[http] +% \centering +% \includegraphics[width=0.7\columnwidth]{figures/testC.pdf} +% \caption{Dose deposit by a photon beam of 24 mm of width in water (normalized value).} +% \label{f_tray} +% \end{figure} + +The secondary stage of the {\it{Neurad}} project is the learning step and this +is the most time consuming step. This step is performed off-line but it is +important to reduce the time used for the learning process to keep a workable +tool. Indeed, if the learning time is too huge (for the moment, this time could +reach one week for a limited domain), this process should not be launched at any +time, but only when a major modification occurs in the environment, like a +change of context for instance. However, it is interesting to update the +knowledge of the neural network, by using the learning process, when the domain evolves (evolution in material used for the prosthesis or evolution on the beam -(size, shape or energy)). The learning time is linked with the volume of data -who could be very important in real medical context. We have work to reduce -this learning time with a parallel method of the learning process using a -partitioning method of the global dataset. The goal of this method is to train -many neural networks on sub-domain of the global dataset. After this training, -the use of this neural networks together allows to obtain a response for the -global domain of study. +(size, shape or energy)). The learning time is related to the volume of data who +could be very important in a real medical context. A work has been done to +reduce this learning time with the parallelization of the learning process by +using a partitioning method of the global dataset. The goal of this method is to +train many neural networks on sub-domains of the global dataset. After this +training, the use of these neural networks all together allows to obtain a +response for the global domain of study. \begin{figure}[h] \centering \includegraphics[width=0.5\columnwidth]{figures/overlap.pdf} \caption{Overlapping for a sub-network in a two-dimensional domain with ratio - $\alpha$.} + $\alpha$} \label{fig:overlap} \end{figure} - -However, performing the learnings on sub-domains constituting a partition of the +% j'ai relu mais pas vu le probleme + +However, performing the learning on sub-domains constituting a partition of the initial domain is not satisfying according to the quality of the results. This comes from the fact that the accuracy of the approximation performed by a neural -network is not constant over the learned domain. Thus, it is necessary to use -an overlapping of the sub-domains. The overall principle is depicted in -Figure~\ref{fig:overlap}. In this way, each sub-network has an exploitation +network is not constant over the learned domain. Thus, it is necessary to use an +overlapping of the sub-domains. The overall principle is depicted in +Figure~\ref{fig:overlap}. In this way, each sub-network has an exploitation domain smaller than its training domain and the differences observed at the borders are no longer relevant. Nonetheless, in order to preserve the -performances of the parallel algorithm, it is important to carefully set the -overlapping ratio $\alpha$. It must be large enough to avoid the border's -errors, and as small as possible to limit the size increase of the data subsets. - - +performance of the parallel algorithm, it is important to carefully set the +overlapping ratio $\alpha$. It must be large enough to avoid the border's +errors, and as small as possible to limit the size increase of the data +subsets~\cite{AES2009}. +%(Qu'en est-il pour nos test ?). +% Ce paramètre a deja été etudié dans un précédent papier, il a donc choisi d'être fixe +% pour ces tests-ci. \section{The XtremWeb-CH environment} -\section{Neurad gridification with XTremweb-ch} +\input{xwch.tex} + +\section{The Neurad gridification} + +\label{sec:neurad_gridif} + + +As previously exposed, the Neurad application can be divided into +three steps. The goal of the first step is to decompose the data +representing the dose distribution on an area. This area contains +various parameters, like the nature of the medium and its +density. This part is out of the scope of this paper. +%Multiple ``views'' can be +%superposed in order to obtain a more accurate learning. + +The second step of the application, and the most time consuming, is +the learning itself. This is the one which has been parallelized, +using the XWCH environment. As exposed in the section 2, the +parallelization relies on a partitionning of the global +dataset. Following this partitionning all learning tasks are executed +in parallel independently with their own local data part, with no +communication, following the fork/join model. Clearly, this +computation fits well with the model of the chosen middleware. + +The execution scheme is then the following (see Figure +\ref{fig:neurad_grid}): +\begin{enumerate} +\item We first send the learning application and its data to the + middleware (more precisely on warehouses (DW)) and create the + computation module; +\item When a worker (W) is ready to compute, it requests a task to + execute to the coordinator (Coord.); +\item The coordinator assigns the worker a task. This last one retrieves the +application and its assigned data and so can start the computation. +\item At the end of the learning process, the worker sends the result to a warehouse. +\end{enumerate} + +The last step of the application is to retrieve these results (some +weighted neural networks) and exploit them through a dose distribution +process. + + +\begin{figure}[ht] + \centering + \includegraphics[width=8cm]{figures/neurad_gridif} + \caption{The proposed Neurad gridification} + \label{fig:neurad_grid} +\end{figure} + \section{Experimental results} +\label{sec:neurad_xp} + +The aim of this section is to describe and analyze the experimental +results we have obtained with the parallel Neurad version previously +described. Our goal was to carry out this application with real input +data and on a real volunteer computing testbed. + +\subsubsection{Experimental conditions} +\label{sec:neurad_cond} + +The size of the input data is about 2.4Gb. In order to avoid that +noise appears and disturbs the learning process, these data can be +divided into, at most, 25 parts. This generates input data parts of +about 15Mb (in a compressed format). The output data, which are +retrieved after the process, are about 30Kb for each part. We used two +distinct deployments of XWCH: +\begin{enumerate} + +\item In the first one, called ``distributed XWCH'', + the XWCH coordinator and the warehouses were located in Geneva, + Switzerland while the workers were running in the same local cluster + in Belfort, France. + +\item The second deployment, called ``local XWCH'' is a local + deployment where both coordinator, warehouses and workers were in + the same local cluster. + +\end{enumerate} +For both deployments, le local cluster is a campus cluster and during +the day these machines were used by students of the Computer Science +Department of the IUT of Belfort. Unfortunately, the data +decomposition limitation does not allow us to use more than 25 +computers (XWCH workers). + +In order to evaluate the overhead induced by the use of the platform +we have furthermore compared the execution of the Neurad application +with and without the XWCH platform. For the latter case, we mean that the +testbed consists only in workers deployed with their respective data +by the use of shell scripts. No specific middleware was used and the +workers were in the same local cluster. + +Finally, five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$, +$0.50e^{-1}$, $0.25e^{-1}$, and $1e^{-2}$. + + +\subsubsection{Results} +\label{sec:neurad_result} + + +Table \ref{tab:neurad_res} presents the execution times of the Neurad +application on 25 machines with XWCH (local and distributed +deployment) and without XWCH. These results correspond to the measures +of the same steps for both kinds of execution, i.e. sending of local +data and the executable, the learning process, and retrieving the +results. Results represent the average time of $5$ executions. + + +\begin{table}[h!] + \renewcommand{\arraystretch}{1.7} + \centering + \begin{tabular}[h!]{|c|c|c|c|c|} + \hline + ~Precision~ & ~1 machine~ & ~Without XWCH~ & ~With XWCH~ & ~With + local XWCH~ \\ + \hline + $1e^{-1}$ & 5190 & 558 & 759 & 629\\ + $0.75e^{-1}$ & 6307 & 792 & 1298 & 801 \\ + $0.50e^{-1}$ & 7487 & 792 & 1010 & 844 \\ + $0.25e^{-1}$ & 7787 & 791 & 1000 & 852\\ + $1e^{-2}$ & 11030 & 1035 & 1447 & 1108 \\ + \hline + \end{tabular} + \vspace{0.3cm} +\caption{Execution time in seconds of the Neurad application, with and without using the XWCH platform} + \label{tab:neurad_res} +\end{table} + +%\begin{table}[ht] +% \centering +% \begin{tabular}[h]{|c|c|c|} +% \hline +% Precision & Without XWCH & With XWCH \\ +% \hline +% $1e^{-1}$ & $558$s & $759$s\\ +% \hline +% \end{tabular} +% \caption{Execution time in seconds of Neurad application, with and without using XtremWeb-CH platform} +% \label{tab:neurad_res} +%\end{table} + + +As we can see, in the case of a local deployment the overhead induced +by the use of the XWCH platform is about $7\%$. It is clearly a low +overhead. Now, for the distributed deployment, the overhead is about +$34\%$. Regarding to the benefits of the platform, it is a very +acceptable overhead which can be explained by the following points. + +First, we point out that the conditions of executions are not really +identical between with and without XWCH contexts. For this last one, +though the same steps were done, all transfer processes are inside a +local cluster with a high bandwidth and a low latency. Whereas when +using XWCH, all transfer processes (between datawarehouses, workers, +and the coordinator) used a wide network area with a smaller +bandwidth. In addition, in executions without XWCH, all the machines +started immediately the computation, whereas when using the XWCH +platform, a latency is introduced by the fact that a computation +starts on a machine, only when this one requests a task. + +This underlines that, unsurprisingly, deploying a local +coordinator and one or more warehouses near a cluster of workers can +enhance computations and platform performances. + + + + \section{Conclusion and future works} +In this paper, we have presented a gridification of a real medical +application, the Neurad application. This radiotherapy application +tries to optimize the irradiated dose distribution within a +patient. Based on a multi-layer neural network, this application +presents a very time consuming step, i.e. the learning step. Due to the +computing characteristics of this step, we choose to parallelize it +using the XtremWeb-CH volunteer computing environment. Obtained +experimental results show good speed-ups and underline that overheads +induced by XWCH are very acceptable, letting it be a good candidate +for deploying parallel applications over a volunteer computing environment. + +Our future works include the testing of the application on a more +large scale testbed. This implies, the choice of a data input set +allowing a finer decomposition. Unfortunately, this choice of input +data is not trivial and relies on a large number of parameters. + +We are also planning to test XWCH with parallel applications where +communication between workers occurs during the execution. In this +way, the use of the asynchronous iteration model \cite{bcl08} may be +an interesting perspective. + +%(demander ici des précisions à Marc). +% Si tu veux parler de l'ensembles des paramètres que l'on peut utiliser pour caractériser les conditions d'irradiations +% tu peux parler : +% - caracteristiques du faisceaux d'irradiation (beam size (de quelques mm à plus de 40 cm), energy, SSD (source surface distance), +% - caractéritiques de la matière : density + \bibliographystyle{plain}