Nouvelle passe.

[gpc2011.git] / gpc2011.tex
diff --git a/gpc2011.tex b/gpc2011.tex

index 027142a..83a04b4 100644 (file)
--- a/gpc2011.tex
+++ b/gpc2011.tex
@@ -45,170 +45,374 @@
  
  \title{Gridification of a Radiotherapy Dose Computation Application with the XtremWeb-CH Environment}
  
  
  \title{Gridification of a Radiotherapy Dose Computation Application with the XtremWeb-CH Environment}
  
-\author{Nabil Abdennhader\inst{1} \and Raphaël Couturier\inst{1} \and David \and
-  Julien  Henriet\inst{2} \and  Laiymani\inst{1}  \and Sébastien  Miquée\inst{1}
-  \and Marc Sauget\inst{2}}
  
  
-\institute{Laboratoire d'Informatique de l'universit\'{e}
+\author{Nabil Abdennhader\inst{1} \and Mohamed Ben Belgacem\inst{1} \and Raphaël Couturier\inst{2} \and
+  David Laiymani\inst{2} \and Sébastien  Miquée\inst{2} \and Marko Niinimaki\inst{1} \and Marc Sauget\inst{2}}
+
+\institute{
+University of Applied Sciences Western Switzerland, hepia Geneva,
+Switzerland \\
+\email{nabil.abdennadher@hesge.ch,mohamed.benbelgacem@unige.ch,markopekka.niinimaeki@hesge.ch}
+\and
+Laboratoire d'Informatique de l'universit\'{e}
    de Franche-Comt\'{e} \\
    IUT Belfort-Montbéliard, Rue Engel Gros, 90016 Belfort - France \\
    de Franche-Comt\'{e} \\
    IUT Belfort-Montbéliard, Rue Engel Gros, 90016 Belfort - France \\
-\email{raphael.couturier, david.laiymani, sebastien.miquee@univ-fcomte.fr}
+\email{\{raphael.couturier,david.laiymani,sebastien.miquee\}@univ-fcomte.fr}
  \and
   FEMTO-ST, ENISYS/IRMA, F-25210 Montb\'{e}liard , FRANCE\\
  \and
   FEMTO-ST, ENISYS/IRMA, F-25210 Montb\'{e}liard , FRANCE\\
+\email{marc.sauget@femto-st.fr}
  }
  }
-%\email{\texttt{[laiymani]@lifc.univ-fcomte.fr}}}
  
  
  \maketitle
  
  \begin{abstract} 
  
  
  \maketitle
  
  \begin{abstract} 
-  
+  This paper presents the design and the evaluation of the
+  gridification of a radiotherapy dose computation application. Due to
+  the inherent characteristics of the application and its execution,
+  we choose the architectural context of global (or volunteer)
+  computing.  For this, we used the XtremWeb-CH
+  environment. Experiments were conducted on a real global computing
+  testbed and show good speed-ups and very acceptable platform
+  overhead letting XtremWeb-CH be a good candidate for deploying
+  parallel applications over a global computing environment.
  \end{abstract}
  
  \end{abstract}
  
+
  %-------------INTRODUCTION--------------------
  \section{Introduction}
  
  %-------------INTRODUCTION--------------------
  \section{Introduction}
  
-The use of distributed architectures for solving large scientific problems seems
-to  become  mandatory  in  a  lot  of  cases. For  example,  in  the  domain  of
-radiotherapy dose computation the problem  is crucial. The main goal of external
-beam  radiotherapy is  the treatment  of  tumours while  minimizing exposure  to
-healthy tissue. Dosimetric  planning has to be carried out  in order to optimize
-the dose distribution within the patient is necessary. Thus, for determining the
-most accurate dose distribution during  treatment planning, a compromise must be
-found between  the precision and  the speed of calculation.  Current techniques,
-using   analytic   methods,  models   and   databases,   are   rapid  but   lack
-precision. Enhanced precision can be  achieved by using calculation codes based,
-for example, on Monte Carlo methods. In [] the authors proposed a novel approach
-based on the use of neural  networks. The approach is based on the collaboration
-of  computation  codes  and   multi-layer  neural  networks  used  as  universal
-approximators. It provides a fast  and accurate evaluation of radiation doses in
-any given environment for given  irradiation parameters. As the learning step is
-often very time consumming, in \cite{bcvsv08:ip} the authors proposed a parallel
-algorithm  that enable  to decompose  the learning  domain into  subdomains. The
-decomposition has  the advantage to  significantly reduce the complexity  of the
-target functions to approximate.
-
-Now,  as  there  exist  several classes  of  distributed/parallel  architectures
-(supercomputers,  clusters, global  computing...)  we have  to  choose the  best
-suited  one  for  the  parallel  Neurad application.  The  Global  or  Volunteer
-computing model seems  to be an interesting approach.  Here, the computing power
-is obtained  by agregating unused  (or volunteer) public resources  connected to
-the Internet.  For our  case, we  can imagine for  example, that  a part  of the
-architecture  will  be  composed of  some  of  the  different computers  of  the
-hospital. This approach present the advantage  to be clearly cheaper than a more
-dedicated approach like the use of supercomputer or clusters.
-
-The aim of this  paper is to propose and evaluate a  gridification of the Neurad
-application (more precisely, of the most time consuming part, the learning step)
-using  a  Global computing  approach.  For this,  we  focus  on the  XtremWeb-CH
-environnement []. We choose this  environnent because it tackles the centralized
-aspect of other global computing environments such as XTremWeb [] or Seti []. It
-tends  to  a  peer-to-peer  approach  by distributing  some  components  of  the
-architecture.  For  instance,  the  computing  nodes  are  allowed  to  directly
-communicate.   Experimentations  were  conducted  on  a  real  Global  Computing
-testbed. The results are very  encouraging. They exhibit an interesting speed-up
-and show that the overhead induced by the use of XTremWeb-CH is very acceptable.
-
-The  paper  is  organized  as  follows.  In section  2  we  present  the  Neurad
-application  and particularly  it most  time  consuming part  i.e. the  learning
-step.  Section 3 details  the XtremWeb-CH  environnement while  in section  4 we
-expose  the gridification of  the Neurad  application. Experimental  results are
-presented in section  5 and we end  in section 6 by some  concluding remarks and
-perspectives.
+The use of distributed architectures for solving large scientific
+problems seems to become mandatory in a lot of cases. For example, in
+the domain of radiotherapy dose computation the problem is
+crucial. The main goal of external beam radiotherapy is the treatment
+of tumors while minimizing exposure to healthy tissue. Dosimetric
+planning has to be carried out in order to optimize the dose
+distribution within the patient. Thus, to determine the most accurate
+dose distribution during treatment planning, a compromise must be
+found between the precision and the speed of calculation. Current
+techniques, using analytic methods, models and databases, are rapid
+but lack precision. Enhanced precision can be achieved by using
+calculation codes based, for example, on Monte Carlo methods. The main
+drawback of these methods is their computation times which can be
+rapidly huge. In \cite{} the authors proposed a novel approach, called
+Neurad, using neural networks. This approach is based on the
+collaboration of computation codes and multi-layer neural networks
+used as universal approximators. It provides a fast and accurate
+evaluation of radiation doses in any given environment for given
+irradiation parameters. As the learning step is often very time
+consuming, in \cite{} the authors proposed a parallel
+algorithm that enables to decompose the learning domain into
+subdomains. The decomposition has the advantage to significantly
+reduce the complexity of the target functions to approximate.
+
+Now, as there exist several classes of distributed/parallel
+architectures (supercomputers, clusters, global computing...)  we have
+to choose the best suited one for the parallel Neurad application.
+The Global or Volunteer Computing model seems to be an interesting
+approach. Here, the computing power is obtained by aggregating unused
+(or volunteer) public resources connected to the Internet. For our
+case, we can imagine for example, that a part of the architecture will
+be composed of some of the different computers of the hospital. This
+approach presents the advantage to be clearly cheaper than a more
+dedicated approach like the use of supercomputers or clusters.
+
+The aim of this paper is to propose and evaluate a gridification of
+the Neurad application (more precisely, of the most time consuming
+part, the learning step) using a Global Computing approach. For this,
+we focus on the XtremWeb-CH environment\cite{}. We choose this environment
+because it tackles the centralized aspect of other global computing
+environments such as XtremWeb\cite{} or Seti\cite{}. It tends to a
+peer-to-peer approach by distributing some components of the
+architecture. For instance, the computing nodes are allowed to
+directly communicate. Experiments were conducted on a real Global
+Computing testbed. The results are very encouraging. They exhibit an
+interesting speed-up and show that the overhead induced by the use of
+XtremWeb-CH is very acceptable.
+
+The paper is organized as follows. In Section 2 we present the Neurad
+application and particularly its most time consuming part, i.e. the
+learning step. Section 3 details the XtremWeb-CH environment and
+Section 4 exposes the gridification of the Neurad
+application. Experimental results are presented in Section 5 and we
+end in Section 6 by some concluding remarks and perspectives.
  
  \section{The Neurad application}
  
  \begin{figure}[http]
    \centering
    \includegraphics[width=0.7\columnwidth]{figures/neurad.pdf}
  
  \section{The Neurad application}
  
  \begin{figure}[http]
    \centering
    \includegraphics[width=0.7\columnwidth]{figures/neurad.pdf}
-  \caption{The Neurad projects}
+  \caption{The Neurad project}
    \label{f_neurad}
  \end{figure}
  
    \label{f_neurad}
  \end{figure}
  
-The \emph{Neurad}~\cite{Neurad} project presented in this paper takes place in a
-multi-disciplinary   project  ,  involving   medical  physicists   and  computer
-scientists whose goal  is to enhance the treatment  planning of cancerous tumors
-by         external        radiotherapy.          In         our        previous
-works~\cite{RADIO09,ICANN10,NIMB2008}, we have  proposed an original approach to
-solve scientific problems whose  accurate modeling and/or analytical description
-are difficult.  That method is based on the collaboration of computational codes
-and neural networks  used as universal interpolator. Thanks  to that method, the
-\emph{Neurad}  software provides  a fast  and accurate  evaluation  of radiation
-doses in  any given environment  (possibly inhomogeneous) for  given irradiation
-parameters. We  have shown in a  previous work (\cite{AES2009})  the interest to
-use a distributed algorithm for the  neural network learning. We use a classical
-RPROP algorithm with a HPU topology to do the training of our neural network.
-
-The  Figure~\ref{f_neurad} presents  the {\it{Neurad}}  scheme. Three  parts are
-clearly independant : the initial  data production, the learning process and the
-dose deposit  evaluation.  The first step,  the data production,  is outside the
-{\it{Neurad}}  project.  They  are  many  solutions to  obtains  data about  the
-radiotherapy treatments like the measure  or the simulation.  The only essential
-criterion is  that the result  must be obtain  in a homogeneous  environment. We
-have chosen  to use  only a Monte  Carlo simulation  because this tools  are the
-references in the radiotherapy domains. The advantages to use data obtain with a
-Monte Carlo  simulator are the  following : accuracy, profusing,  quantify error
-and regularity  of measure point.  But,  they are too disagreement  and the most
-important  is  the  statistical  noise  forcing  a  data  post  treatment.   The
-Figure~\ref{f_tray} present the general behavior of a dose deposit in water.
-
-
-\begin{figure}[http]
-  \centering
-  \includegraphics[width=0.7\columnwidth]{figures/testC.pdf}
-  \caption{Dose deposit by a photon beam  of 24 mm of width in water (Normalized value). }
-  \label{f_tray}
-\end{figure}
-
-The secondary stage of the {\it{Neurad}}  project is about the learning step and
-it is the most time consuming step. This step is off-line but is it important to
-reduce the time used for the  learning process to keep a workable tools. Indeed,
-if the learning time is too important (for the moment, this time could reach one
-week for a  limited works domain), the  use of this process could  be be limited
-only at a major modification of  the use context.  However, it is interesting to
-do  an update to  the learning  process when  the bound  of the  learning domain
-evolves (evolution in material used for  the prosthesis or evolution on the beam
-(size, shape  or energy)). The learning time  is linked with the  volume of data
-who could  be very important  in real medical  context.  We have work  to reduce
-this  learning time  with a  parallel  method of  the learning  process using  a
-partitioning method of  the global dataset. The goal of this  method is to train
-many neural networks  on sub-domain of the global  dataset.  After this training,
-the use  of this neural  networks together allows  to obtain a response  for the
-global domain of study.
+The \emph{Neurad}~\cite{Neurad} project presented in this paper takes
+place in a multi-disciplinary project, involving medical physicists
+and computer scientists whose goal is to enhance the treatment
+planning of cancerous tumors by external radiotherapy. In our previous
+works~\cite{RADIO09,ICANN10,NIMB2008}, we have proposed an original
+approach to solve scientific problems whose accurate modeling and/or
+analytical description are difficult. That method is based on the
+collaboration of computational codes and neural networks used as
+universal interpolator. Thanks to that method, the \emph{Neurad}
+software provides a fast and accurate evaluation of radiation doses in
+any given environment (possibly inhomogeneous) for given irradiation
+parameters. We have shown in a previous work (\cite{AES2009}) the
+interest to use a distributed algorithm for the neural network
+learning. We use a classical RPROP (DEFINITION)algorithm with a HPU
+topology to do the training of our neural network.
+
+Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts
+are clearly independent: the initial data production, the learning
+process and the dose deposit evaluation. The first step, the data
+production, is outside of the {\it{Neurad}} project. They are many
+solutions to obtain data about the radiotherapy treatments like the
+measure or the simulation. The only essential criterion is that the
+result must be obtained in an homogeneous environment. 
+
+% We have chosen to
+% use only a Monte Carlo simulation because this kind of tool is the
+% reference in the radiotherapy domains. The advantages to use data
+% obtained with a Monte Carlo simulator are the following: accuracy,
+% profusion, quantified error and regularity of measure points. But,
+% there exist also some disagreements and the most important is the
+% statistical noise, forcing a data post treatment. Figure~\ref{f_tray}
+% presents the general behavior of a dose deposit in water.
+
+
+% \begin{figure}[http]
+%   \centering
+%   \includegraphics[width=0.7\columnwidth]{figures/testC.pdf}
+%   \caption{Dose deposit by a photon beam  of 24 mm of width in water (normalized value).}
+%   \label{f_tray}
+% \end{figure}
+
+The secondary stage of the {\it{Neurad}} project is the learning step
+and this is the most time consuming step. This step is performed
+off-line but it is important to reduce the time used for the learning
+process to keep a workable tool. Indeed, if the learning time is too
+huge (for the moment, this time could reach one week for a limited
+domain), this process should not be launched at any time, but only
+when a major modification occurs in the environment, like a change of
+context for instance. However, it is interesting to update the
+knowledge of the neural network, by using the learning process, when
+the domain evolves (evolution in material used for the prosthesis or
+evolution on the beam (size, shape or energy)). The learning time is
+related to the volume of data who could be very important in a real
+medical context.  A work has been done to reduce this learning time
+with the parallelization of the learning process by using a
+partitioning method of the global dataset. The goal of this method is
+to train many neural networks on sub-domains of the global
+dataset. After this training, the use of these neural networks all
+together allows to obtain a response for the global domain of study.
  
  
  \begin{figure}[h]
    \centering
    \includegraphics[width=0.5\columnwidth]{figures/overlap.pdf}
    \caption{Overlapping for a sub-network  in a two-dimensional domain with ratio
  
  
  \begin{figure}[h]
    \centering
    \includegraphics[width=0.5\columnwidth]{figures/overlap.pdf}
    \caption{Overlapping for a sub-network  in a two-dimensional domain with ratio
-    $\alpha$.}
+    $\alpha$}
    \label{fig:overlap}
  \end{figure}
  
  
    \label{fig:overlap}
  \end{figure}
  
  
-However, performing the learnings on sub-domains constituting a partition of the
-initial domain is  not satisfying according to the quality  of the results. This
-comes from the fact that the accuracy of the approximation performed by a neural
-network is not  constant over the learned domain.  Thus, it  is necessary to use
-an  overlapping  of the  sub-domains.   The  overall  principle is  depicted  in
-Figure~\ref{fig:overlap}.   In this  way, each  sub-network has  an exploitation
-domain  smaller than its  training domain  and the  differences observed  at the
-borders  are  no  longer  relevant.   Nonetheless,  in  order  to  preserve  the
-performances of  the parallel  algorithm, it is  important to carefully  set the
-overlapping  ratio $\alpha$.   It must  be large  enough to  avoid  the border's
-errors, and as small as possible to limit the size increase of the data subsets.
-
-
+However, performing the learning on sub-domains constituting a
+partition of the initial domain is not satisfying according to the
+quality of the results. This comes from the fact that the accuracy of
+the approximation performed by a neural network is not constant over
+the learned domain. Thus, it is necessary to use an overlapping of
+the sub-domains. The overall principle is depicted in
+Figure~\ref{fig:overlap}. In this way, each sub-network has an
+exploitation domain smaller than its training domain and the
+differences observed at the borders are no longer relevant.
+Nonetheless, in order to preserve the performance of the parallel
+algorithm, it is important to carefully set the overlapping ratio
+$\alpha$. It must be large enough to avoid the border's errors, and
+as small as possible to limit the size increase of the data subsets
+(Qu'en est-il pour nos test ?).
  
  
  
  \section{The XtremWeb-CH environment}
  
  
  
  \section{The XtremWeb-CH environment}
-\section{The XtremWeb-CH environment (\textit{XWCH})}
  \input{xwch.tex}
  \input{xwch.tex}
+
+\section{The Neurad gridification}
+
+\label{sec:neurad_gridif}
+
+
+As previously exposed, the Neurad application can be divided into
+three steps.  The goal of the first step is to decompose the data
+representing the dose distribution on an area. This area contains
+various parameters, like the nature of the medium and its
+density. This part is out of the scope of this paper.
+%Multiple ``views'' can be
+%superposed in order to obtain a more accurate learning. 
+
+The second step of the application, and the most time consuming, is
+the learning itself. This is the one which has been parallelized,
+using the XWCH environment. As exposed in the section 2, the
+parallelization relies on a partitionning of the global
+dataset. Following this partitionning all learning tasks are executed
+in parallel independently with their own local data part, with no
+communication, following the fork/join model. Clearly, this
+computation fits well with the model of the chosen middleware.
+
+The execution scheme is then the following (see Figure
+\ref{fig:neurad_grid}):
+\begin{enumerate}
+\item We first send the learning application and its data to the
+  middleware (more precisely on warehouses (DW)) and create the
+  computation module;
+\item When a worker (W) is ready to compute, it requests a task to
+  execute to the coordinator (Coord.);
+\item The coordinator assigns the worker a task. This last one retrieves the
+application and its assigned data and so can start the computation. 
+\item At the end of the learning process, the worker sends the result to a warehouse.
+\end{enumerate}
+
+The last step of the application is to retrieve these results (some
+weighted neural networks) and exploit them through a dose distribution
+process. This latter step is out of the scope of this paper.
+
+
+\begin{figure}[ht]
+  \centering
+  \includegraphics[width=8cm]{figures/neurad_gridif}
+  \caption{The proposed Neurad gridification}
+  \label{fig:neurad_grid}
+\end{figure}
+
  \section{Experimental results}
  \section{Experimental results}
-\section{Conclusion and future works}
+\label{sec:neurad_xp}
+
+The aim of this section is to describe and analyze the experimental
+results we have obtained with the parallel Neurad version previously
+described. Our goal was to carry out this application with real input
+data and on a real global computing testbed.
+
+\subsubsection{Experimental conditions}
+\label{sec:neurad_cond}
+
+The size of the input data is about 2.4Gb. In order to avoid that data
+noise appears and disturbs the learning process, these data can be
+divided into, at most, 25 parts. This generates input data parts of
+about 15Mb (in a compressed format). The output data, which are
+retrieved after the process, are about 30Kb for each
+part. Unfortunately, the data decomposition limitation does not allow
+us to use more than 25 computers (XWCH workers). Nevertheless, we used two
+distinct deployments of XWCH:
+\begin{enumerate} 
+
+\item In the first one, called ``distributed XWCH'' in the following,
+  the XWCH coordinator and the warehouses were located in Geneva,
+  Switzerland while the workers were running in the same local cluster
+  in Belfort, France.
+
+\item The second deployment, called ``local XWCH'' is a local
+  deployment where both coordinator, warehouses and workers were in
+  the same local cluster.  
+
+\end{enumerate}
+For both deployments, during the day these machines were used by
+students of the Computer Science Department of the IUT of Belfort.
+
+In order to evaluate the overhead induced by the use of the platform
+we have furthermore compared the execution of the Neurad application
+with and without the XWCH platform. For the latter case, we mean that the
+testbed consists only in workers deployed with their respective data
+by the use of shell scripts. No specific middleware was used and the
+workers were in the same local cluster.
+
+Finally, five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$,
+$0.50e^{-1}$, $0.25e^{-1}$, and $1e^{-2}$.
+
+
+\subsubsection{Results}
+\label{sec:neurad_result}
+
+Table \ref{tab:neurad_res} presents the execution times of the Neurad
+application on 25 machines with XWCH (local and distributed
+deployment) and without XWCH. These results correspond to the measures
+of the same steps for both kinds of execution, i.e. sending of local
+data and the executable, the learning process, and retrieving the
+results. Results represent the average time of $?? x ??$ executions.
+
+
+\begin{table}[h!]
+  \renewcommand{\arraystretch}{1.7}
+  \centering
+  \begin{tabular}[h!]{|c|c|c|c|c|}
+    \hline
+    ~Precision~ & ~1 machine~ & ~Without XWCH~ & ~With XWCH~ & ~With
+    local XWCH~ \\
+    \hline
+     $1e^{-1}$ & 5190 & 558 & 759 & 629\\
+    $0.75e^{-1}$ & 6307 & 792 & 1298 & 801 \\
+    $0.50e^{-1}$ & 7487 & 792 & 1010 & 844 \\
+    $0.25e^{-1}$ & 7787 & 791 & 1000 & 852\\
+    $1e^{-2}$ & 11030 & 1035 & 1447 & 1108 \\
+    \hline
+  \end{tabular}
+  \vspace{0.3cm}
+\caption{Execution time in seconds of the Neurad application, with and without using the XWCH platform}
+  \label{tab:neurad_res}
+\end{table}
+
+%\begin{table}[ht]
+%  \centering
+%  \begin{tabular}[h]{|c|c|c|}
+%    \hline
+%    Precision & Without XWCH & With XWCH \\
+%    \hline
+%    $1e^{-1}$ & $558$s & $759$s\\
+%    \hline
+%  \end{tabular}
+%  \caption{Execution time in seconds of Neurad application, with and without using XtremWeb-CH platform}
+%  \label{tab:neurad_res}
+%\end{table}
+
+
+As we can see, in the case of a local deployment the overhead induced
+by the use of the XWCH platform is about $7\%$. It is clearly a low
+overhead. Now, for the distributed deployment, the overhead is about
+$34\%$. Regarding to the benefits of the platform, it is a very
+acceptable overhead which can be explained by the following points.
+
+First, we point out that the conditions of executions are not really
+identical between with and without XWCH contexts. For this last one,
+though the same steps were done, all transfer processes are inside a
+local cluster with a high bandwidth and a low latency. Whereas when
+using XWCH, all transfer processes (between datawarehouses, workers,
+and the coordinator) used a wide network area with a smaller
+bandwidth. In addition, in executions without XWCH, all the machines
+started immediately the computation, whereas when using the XWCH
+platform, a latency is introduced by the fact that a computation
+starts on a machine, only when this one requests a task.
+
+This underlines that, unsurprisingly, deploying a local
+coordinator and one or more warehouses near a cluster of workers can
+enhance computations and platform performances. 
  
  
  
  
+\section{Conclusion and future works}
+
+In this paper, we have presented a gridification of a real medical
+application, the Neurad application. This radiotherapy application
+tries to optimize the irradiated dose distribution within a
+patient. Based on a multi-layer neural network, this application
+presents a very time consuming step, i.e. the learning step. Due to the
+computing characteristics of this step, we choose to parallelize it
+using the XtremWeb-CH global computing environment. Obtained
+experimental results show good speed-ups and underline that overheads
+induced by XWCH are very acceptable, letting it be a good candidate
+for deploying parallel applications over a global computing environment.
+
+Our future works include the testing of the application on a more
+large scale testbed. This implies, the choice of a data input set
+allowing a finer decomposition. Unfortunately, this choice of input
+data is not trivial and relies on a large number of parameters
+(demander ici des précisions à Marc).
  
  \bibliographystyle{plain}
  \bibliography{biblio}
  
  \bibliographystyle{plain}
  \bibliography{biblio}