gpc2011.tex

   1 \documentclass{llncs}
   2 %\usepackage{latex8}
   3 %\usepackage{times}
   4 %\documentclass[a4paper,11pt]{article}
   5 %\usepackage{fullpage}
   6 \usepackage[T1]{fontenc}
   7 \usepackage[utf8]{inputenc}
   8 \usepackage{graphicx,subfigure,graphics}
   9 \usepackage{epsfig}
  10 %\usepackage[usenames]{color}
  11 %\usepackage{latexsym,stmaryrd}
  12 %\usepackage{amsfonts,amssymb}
  13 \usepackage{verbatim,theorem,moreverb}
  14 %\usepackage{float,floatflt}
  15 \usepackage{boxedminipage}
  16 \usepackage{url}
  17 %\usepackage{psfig}
  18 \usepackage{amsmath}
  19 \usepackage{amsfonts}
  20 \usepackage{amssymb}
  21 \usepackage{algorithm}
  22 \usepackage{algorithmic}
  23 %\usepackage{floatfig}
  24 %\usepackage{picins}
  25
  26
  27
  28 \def\sfixme#1{\fbox{\textbf{FIXME: }#1}}
  29
  30 \newcommand{\fixme}[1]{%
  31   \begin{center}
  32     \begin{boxedminipage}{.8\linewidth}
  33       \textsl{{\bf #1}}
  34     \end{boxedminipage}
  35   \end{center}
  36 }
  37 \newcommand{\FIXME}[1]{\marginpar[\null\hspace{2cm} FIXME]{FIXME} \fixme{#1}}
  38
  39 %\psfigurepath{.:fig:IMAGES}
  40 \graphicspath{{.}{fig/}{IMAGES/}}
  41
  42 %\initfloatingfigs
  43
  44 \begin{document}
  45
  46 \title{Gridification of a Radiotherapy Dose Computation Application with the XtremWeb-CH Environment}
  47
  48
  49 \author{Nabil Abdennhader\inst{1} \and Mohamed Ben Belgacem\inst{1} \and Raphaël Couturier\inst{2} \and
  50   David Laiymani\inst{2} \and Sébastien  Miquée\inst{2} \and Marko Niinimaki\inst{1} \and Marc Sauget\inst{3}}
  51
  52 \institute{
  53 University of Applied Sciences Western Switzerland, hepia Geneva,
  54 Switzerland \\
  55 \email{nabil.abdennadher@hesge.ch,mohamed.benbelgacem@unige.ch,markopekka.niinimaeki@hesge.ch}
  56 \and
  57 Laboratoire d'Informatique de l'universit\'{e}
  58   de Franche-Comt\'{e} \\
  59   IUT Belfort-Montbéliard, Rue Engel Gros, 90016 Belfort - France \\
  60 \email{\{raphael.couturier,david.laiymani,sebastien.miquee\}@univ-fcomte.fr}
  61 \and
  62  FEMTO-ST, ENISYS/IRMA, F-25210 Montb\'{e}liard , FRANCE\\
  63 \email{marc.sauget@univ-fcomte.fr}
  64 }
  65
  66
  67 \maketitle
  68
  69 \begin{abstract}
  70   This paper presents the design and the evaluation of the
  71   gridification of a radiotherapy dose computation application. Due to
  72   the inherent characteristics of the application and its execution,
  73   we choose the architectural context of global (or volunteer)
  74   computing.  For this, we used the XtremWeb-CH
  75   environment. Experiments were conducted on a real global computing
  76   testbed and show good speed-ups and very acceptable platform
  77   overhead letting XtremWeb-CH be a good candidate for deploying
  78   parallel applications over a global computing environment.
  79 \end{abstract}
  80
  81
  82 %-------------INTRODUCTION--------------------
  83 \section{Introduction}
  84
  85 The use of distributed architectures for solving large scientific
  86 problems seems to become mandatory in a lot of cases. For example, in
  87 the domain of radiotherapy dose computation the problem is
  88 crucial. The main goal of external beam radiotherapy is the treatment
  89 of tumors while minimizing exposure to healthy tissue. Dosimetric
  90 planning has to be carried out in order to optimize the dose
  91 distribution within the patient. Thus, to determine the most accurate
  92 dose distribution during treatment planning, a compromise must be
  93 found between the precision and the speed of calculation. Current
  94 techniques, using analytic methods, models and databases, are rapid
  95 but lack precision. Enhanced precision can be achieved by using
  96 calculation codes based, for example, on Monte Carlo methods. The main
  97 drawback of these methods is their computation times which can be
  98 rapidly huge. In \cite{NIMB2008} the authors proposed a novel approach, called
  99 Neurad, using neural networks. This approach is based on the
 100 collaboration of computation codes and multi-layer neural networks
 101 used as universal approximators. It provides a fast and accurate
 102 evaluation of radiation doses in any given environment for given
 103 irradiation parameters. As the learning step is often very time
 104 consuming, in \cite{AES2009} the authors proposed a parallel
 105 algorithm that enables to decompose the learning domain into
 106 subdomains. The decomposition has the advantage to significantly
 107 reduce the complexity of the target functions to approximate.
 108
 109 Now, as there exist several classes of distributed/parallel
 110 architectures (supercomputers, clusters, global computing...)  we have
 111 to choose the best suited one for the parallel Neurad application.
 112 The Global or Volunteer Computing model seems to be an interesting
 113 approach. Here, the computing power is obtained by aggregating unused
 114 (or volunteer) public resources connected to the Internet. For our
 115 case, we can imagine for example, that a part of the architecture will
 116 be composed of some of the different computers of the hospital. This
 117 approach presents the advantage to be clearly cheaper than a more
 118 dedicated approach like the use of supercomputers or clusters.
 119
 120 The aim of this paper is to propose and evaluate a gridification of
 121 the Neurad application (more precisely, of the most time consuming
 122 part, the learning step) using a Global Computing approach. For this,
 123 we focus on the XtremWeb-CH environment\cite{}. We choose this environment
 124 because it tackles the centralized aspect of other global computing
 125 environments such as XtremWeb\cite{} or Seti\cite{}. It tends to a
 126 peer-to-peer approach by distributing some components of the
 127 architecture. For instance, the computing nodes are allowed to
 128 directly communicate. Experiments were conducted on a real Global
 129 Computing testbed. The results are very encouraging. They exhibit an
 130 interesting speed-up and show that the overhead induced by the use of
 131 XtremWeb-CH is very acceptable.
 132
 133 The paper is organized as follows. In Section 2 we present the Neurad
 134 application and particularly its most time consuming part, i.e. the
 135 learning step. Section 3 details the XtremWeb-CH environment and
 136 Section 4 exposes the gridification of the Neurad
 137 application. Experimental results are presented in Section 5 and we
 138 end in Section 6 by some concluding remarks and perspectives.
 139
 140 \section{The Neurad application}
 141
 142 \begin{figure}[http]
 143   \centering
 144   \includegraphics[width=0.7\columnwidth]{figures/neurad.pdf}
 145   \caption{The Neurad project}
 146   \label{f_neurad}
 147 \end{figure}
 148
 149 The \emph{Neurad}~\cite{Neurad} project presented in this paper takes place in a
 150 multi-disciplinary project, involving medical physicists and computer scientists
 151 whose goal is to enhance the  treatment planning of cancerous tumors by external
 152 radiotherapy.   In our  previous works~\cite{RADIO09,ICANN10,NIMB2008},  we have
 153 proposed  an  original approach  to  solve  scientific  problems whose  accurate
 154 modeling and/or  analytical description are  difficult. That method is  based on
 155 the collaboration of  computational codes and neural networks  used as universal
 156 interpolator. Thanks to that method,  the \emph{Neurad} software provides a fast
 157 and accurate  evaluation of radiation  doses in any given  environment (possibly
 158 inhomogeneous) for  given irradiation  parameters. We have  shown in  a previous
 159 work (\cite{AES2009}) the interest to use a distributed algorithm for the neural
 160 network learning. We  use a classical RPROP~\footnote{Resilient backpropagation}
 161 algorithm with a  HPU~\footnote{High order processing units} topology  to do the
 162 training of our neural network.
 163
 164 Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts are clearly
 165 independent:  the initial  data production,  the learning  process and  the dose
 166 deposit  evaluation. The  first step,  the data  production, is  outside  of the
 167 {\it{Neurad}}  project.  They  are  many  solutions to  obtain  data  about  the
 168 radiotherapy treatments like  the measure or the simulation.  The only essential
 169 criterion is that the result must be obtained in an homogeneous environment.
 170
 171 % We have chosen to
 172 % use only a Monte Carlo simulation because this kind of tool is the
 173 % reference in the radiotherapy domains. The advantages to use data
 174 % obtained with a Monte Carlo simulator are the following: accuracy,
 175 % profusion, quantified error and regularity of measure points. But,
 176 % there exist also some disagreements and the most important is the
 177 % statistical noise, forcing a data post treatment. Figure~\ref{f_tray}
 178 % presents the general behavior of a dose deposit in water.
 179
 180
 181 % \begin{figure}[http]
 182 %   \centering
 183 %   \includegraphics[width=0.7\columnwidth]{figures/testC.pdf}
 184 %   \caption{Dose deposit by a photon beam  of 24 mm of width in water (normalized value).}
 185 %   \label{f_tray}
 186 % \end{figure}
 187
 188 The secondary stage  of the {\it{Neurad}} project is the  learning step and this
 189 is  the most time  consuming step.  This step  is performed  off-line but  it is
 190 important to  reduce the time used for  the learning process to  keep a workable
 191 tool. Indeed, if the learning time is  too huge (for the moment, this time could
 192 reach one week for a limited domain), this process should not be launched at any
 193 time,  but only  when a  major modification  occurs in  the environment,  like a
 194 change  of  context for  instance.  However, it  is  interesting  to update  the
 195 knowledge of the neural network, by  using the learning process, when the domain
 196 evolves (evolution in material used for  the prosthesis or evolution on the beam
 197 (size, shape or energy)). The learning time is related to the volume of data who
 198 could be  very important in  a real  medical context.  A  work has been  done to
 199 reduce this  learning time with the  parallelization of the  learning process by
 200 using a partitioning method of the global dataset. The goal of this method is to
 201 train  many neural networks  on sub-domains  of the  global dataset.  After this
 202 training,  the use  of these  neural networks  all together  allows to  obtain a
 203 response for the global domain of study.
 204
 205
 206 \begin{figure}[h]
 207   \centering
 208   \includegraphics[width=0.5\columnwidth]{figures/overlap.pdf}
 209   \caption{Overlapping for a sub-network  in a two-dimensional domain with ratio
 210     $\alpha$}
 211   \label{fig:overlap}
 212 \end{figure}
 213
 214 % j'ai relu mais pas vu le probleme
 215
 216 However, performing the learning on  sub-domains constituting a partition of the
 217 initial domain is  not satisfying according to the quality  of the results. This
 218 comes from the fact that the accuracy of the approximation performed by a neural
 219 network is not constant over the learned domain. Thus, it is necessary to use an
 220 overlapping  of   the  sub-domains.  The   overall  principle  is   depicted  in
 221 Figure~\ref{fig:overlap}.  In this  way,  each sub-network  has an  exploitation
 222 domain  smaller than its  training domain  and the  differences observed  at the
 223 borders  are  no  longer  relevant.   Nonetheless,  in  order  to  preserve  the
 224 performance  of the parallel  algorithm, it  is important  to carefully  set the
 225 overlapping  ratio $\alpha$.  It  must be  large  enough to  avoid the  border's
 226 errors,  and as  small  as  possible to  limit  the size  increase  of the  data
 227 subsets~\cite{AES2009}.
 228
 229 %(Qu'en est-il pour nos test ?).
 230 % Ce paramètre a deja été etudié dans un précédent papier, il a donc choisi d'être fixe
 231 % pour ces tests-ci.
 232
 233
 234 \section{The XtremWeb-CH environment}
 235 \input{xwch.tex}
 236
 237 \section{The Neurad gridification}
 238
 239 \label{sec:neurad_gridif}
 240
 241
 242 As previously exposed, the Neurad application can be divided into
 243 three steps.  The goal of the first step is to decompose the data
 244 representing the dose distribution on an area. This area contains
 245 various parameters, like the nature of the medium and its
 246 density. This part is out of the scope of this paper.
 247 %Multiple ``views'' can be
 248 %superposed in order to obtain a more accurate learning.
 249
 250 The second step of the application, and the most time consuming, is
 251 the learning itself. This is the one which has been parallelized,
 252 using the XWCH environment. As exposed in the section 2, the
 253 parallelization relies on a partitionning of the global
 254 dataset. Following this partitionning all learning tasks are executed
 255 in parallel independently with their own local data part, with no
 256 communication, following the fork/join model. Clearly, this
 257 computation fits well with the model of the chosen middleware.
 258
 259 The execution scheme is then the following (see Figure
 260 \ref{fig:neurad_grid}):
 261 \begin{enumerate}
 262 \item We first send the learning application and its data to the
 263   middleware (more precisely on warehouses (DW)) and create the
 264   computation module;
 265 \item When a worker (W) is ready to compute, it requests a task to
 266   execute to the coordinator (Coord.);
 267 \item The coordinator assigns the worker a task. This last one retrieves the
 268 application and its assigned data and so can start the computation.
 269 \item At the end of the learning process, the worker sends the result to a warehouse.
 270 \end{enumerate}
 271
 272 The last step of the application is to retrieve these results (some
 273 weighted neural networks) and exploit them through a dose distribution
 274 process. This latter step is out of the scope of this paper.
 275
 276
 277 \begin{figure}[ht]
 278   \centering
 279   \includegraphics[width=8cm]{figures/neurad_gridif}
 280   \caption{The proposed Neurad gridification}
 281   \label{fig:neurad_grid}
 282 \end{figure}
 283
 284 \section{Experimental results}
 285 \label{sec:neurad_xp}
 286
 287 The aim of this section is to describe and analyze the experimental
 288 results we have obtained with the parallel Neurad version previously
 289 described. Our goal was to carry out this application with real input
 290 data and on a real global computing testbed.
 291
 292 \subsubsection{Experimental conditions}
 293 \label{sec:neurad_cond}
 294
 295 The size of the input data is about 2.4Gb. In order to avoid that data
 296 noise appears and disturbs the learning process, these data can be
 297 divided into, at most, 25 parts. This generates input data parts of
 298 about 15Mb (in a compressed format). The output data, which are
 299 retrieved after the process, are about 30Kb for each
 300 part. Unfortunately, the data decomposition limitation does not allow
 301 us to use more than 25 computers (XWCH workers). Nevertheless, we used two
 302 distinct deployments of XWCH:
 303 \begin{enumerate}
 304
 305 \item In the first one, called ``distributed XWCH'' in the following,
 306   the XWCH coordinator and the warehouses were located in Geneva,
 307   Switzerland while the workers were running in the same local cluster
 308   in Belfort, France.
 309
 310 \item The second deployment, called ``local XWCH'' is a local
 311   deployment where both coordinator, warehouses and workers were in
 312   the same local cluster.
 313
 314 \end{enumerate}
 315 For both deployments, during the day these machines were used by
 316 students of the Computer Science Department of the IUT of Belfort.
 317
 318 In order to evaluate the overhead induced by the use of the platform
 319 we have furthermore compared the execution of the Neurad application
 320 with and without the XWCH platform. For the latter case, we mean that the
 321 testbed consists only in workers deployed with their respective data
 322 by the use of shell scripts. No specific middleware was used and the
 323 workers were in the same local cluster.
 324
 325 Finally, five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$,
 326 $0.50e^{-1}$, $0.25e^{-1}$, and $1e^{-2}$.
 327
 328
 329 \subsubsection{Results}
 330 \label{sec:neurad_result}
 331
 332 Table \ref{tab:neurad_res} presents the execution times of the Neurad
 333 application on 25 machines with XWCH (local and distributed
 334 deployment) and without XWCH. These results correspond to the measures
 335 of the same steps for both kinds of execution, i.e. sending of local
 336 data and the executable, the learning process, and retrieving the
 337 results. Results represent the average time of $?? x ??$ executions.
 338
 339
 340 \begin{table}[h!]
 341   \renewcommand{\arraystretch}{1.7}
 342   \centering
 343   \begin{tabular}[h!]{|c|c|c|c|c|}
 344     \hline
 345     ~Precision~ & ~1 machine~ & ~Without XWCH~ & ~With XWCH~ & ~With
 346     local XWCH~ \\
 347     \hline
 348      $1e^{-1}$ & 5190 & 558 & 759 & 629\\
 349     $0.75e^{-1}$ & 6307 & 792 & 1298 & 801 \\
 350     $0.50e^{-1}$ & 7487 & 792 & 1010 & 844 \\
 351     $0.25e^{-1}$ & 7787 & 791 & 1000 & 852\\
 352     $1e^{-2}$ & 11030 & 1035 & 1447 & 1108 \\
 353     \hline
 354   \end{tabular}
 355   \vspace{0.3cm}
 356 \caption{Execution time in seconds of the Neurad application, with and without using the XWCH platform}
 357   \label{tab:neurad_res}
 358 \end{table}
 359
 360 %\begin{table}[ht]
 361 %  \centering
 362 %  \begin{tabular}[h]{|c|c|c|}
 363 %    \hline
 364 %    Precision & Without XWCH & With XWCH \\
 365 %    \hline
 366 %    $1e^{-1}$ & $558$s & $759$s\\
 367 %    \hline
 368 %  \end{tabular}
 369 %  \caption{Execution time in seconds of Neurad application, with and without using XtremWeb-CH platform}
 370 %  \label{tab:neurad_res}
 371 %\end{table}
 372
 373
 374 As we can see, in the case of a local deployment the overhead induced
 375 by the use of the XWCH platform is about $7\%$. It is clearly a low
 376 overhead. Now, for the distributed deployment, the overhead is about
 377 $34\%$. Regarding to the benefits of the platform, it is a very
 378 acceptable overhead which can be explained by the following points.
 379
 380 First, we point out that the conditions of executions are not really
 381 identical between with and without XWCH contexts. For this last one,
 382 though the same steps were done, all transfer processes are inside a
 383 local cluster with a high bandwidth and a low latency. Whereas when
 384 using XWCH, all transfer processes (between datawarehouses, workers,
 385 and the coordinator) used a wide network area with a smaller
 386 bandwidth. In addition, in executions without XWCH, all the machines
 387 started immediately the computation, whereas when using the XWCH
 388 platform, a latency is introduced by the fact that a computation
 389 starts on a machine, only when this one requests a task.
 390
 391 This underlines that, unsurprisingly, deploying a local
 392 coordinator and one or more warehouses near a cluster of workers can
 393 enhance computations and platform performances.
 394
 395
 396 \section{Conclusion and future works}
 397
 398 In this paper, we have presented a gridification of a real medical
 399 application, the Neurad application. This radiotherapy application
 400 tries to optimize the irradiated dose distribution within a
 401 patient. Based on a multi-layer neural network, this application
 402 presents a very time consuming step, i.e. the learning step. Due to the
 403 computing characteristics of this step, we choose to parallelize it
 404 using the XtremWeb-CH global computing environment. Obtained
 405 experimental results show good speed-ups and underline that overheads
 406 induced by XWCH are very acceptable, letting it be a good candidate
 407 for deploying parallel applications over a global computing environment.
 408
 409 Our future works include the testing of the application on a more
 410 large scale testbed. This implies, the choice of a data input set
 411 allowing a finer decomposition. Unfortunately, this choice of input
 412 data is not trivial and relies on a large number of parameters
 413
 414 %(demander ici des précisions à Marc).
 415 % Si tu veux parler de l'ensembles des paramètres que l'on peut utiliser pour caractériser les conditions d'irradiations
 416 % tu peux parler :
 417 % - caracteristiques du faisceaux d'irradiation (beam size (de quelques mm à plus de 40 cm),  energy, SSD (source surface distance),
 418 % - caractéritiques de  la matière : density
 419
 420
 421
 422 \bibliographystyle{plain}
 423 \bibliography{biblio}
 424
 425
 426
 427 \end{document}