gpc2011.tex

   1 \documentclass{llncs}
   2 %\usepackage{latex8}
   3 %\usepackage{times}
   4 %\documentclass[a4paper,11pt]{article}
   5 %\usepackage{fullpage}
   6 \usepackage[T1]{fontenc}
   7 \usepackage[utf8]{inputenc}
   8 \usepackage{graphicx,subfigure,graphics}
   9 \usepackage{epsfig}
  10 %\usepackage[usenames]{color}
  11 %\usepackage{latexsym,stmaryrd}
  12 %\usepackage{amsfonts,amssymb}
  13 \usepackage{verbatim,theorem,moreverb}
  14 %\usepackage{float,floatflt}
  15 \usepackage{boxedminipage}
  16 \usepackage{url}
  17 %\usepackage{psfig}
  18 \usepackage{amsmath}
  19 \usepackage{amsfonts}
  20 \usepackage{amssymb}
  21 \usepackage{algorithm}
  22 \usepackage{algorithmic}
  23 %\usepackage{floatfig}
  24 %\usepackage{picins}
  25
  26
  27
  28 \def\sfixme#1{\fbox{\textbf{FIXME: }#1}}
  29
  30 \newcommand{\fixme}[1]{%
  31   \begin{center}
  32     \begin{boxedminipage}{.8\linewidth}
  33       \textsl{{\bf #1}}
  34     \end{boxedminipage}
  35   \end{center}
  36 }
  37 \newcommand{\FIXME}[1]{\marginpar[\null\hspace{2cm} FIXME]{FIXME} \fixme{#1}}
  38
  39 %\psfigurepath{.:fig:IMAGES}
  40 \graphicspath{{.}{fig/}{IMAGES/}}
  41
  42 %\initfloatingfigs
  43
  44 \begin{document}
  45
  46 \title{Gridification of a Radiotherapy Dose Computation Application with the XtremWeb-CH Environment}
  47
  48
  49 \author{Nabil Abdennadher\inst{1} \and Mohamed Ben Belgacem\inst{1} \and Raphaël Couturier\inst{2} \and
  50   David Laiymani\inst{2} \and Sébastien  Miquée\inst{2} \and Marko Niinimaki\inst{1} \and Marc Sauget\inst{3}}
  51
  52 \institute{
  53 University of Applied Sciences Western Switzerland, hepia Geneva,
  54 Switzerland \\
  55 \email{nabil.abdennadher@hesge.ch,mohamed.benbelgacem@unige.ch,markopekka.niinimaeki@hesge.ch}
  56 \and
  57 Laboratoire d'Informatique de l'universit\'{e}
  58   de Franche-Comt\'{e} \\
  59   IUT Belfort-Montbéliard, Rue Engel Gros, 90016 Belfort - France \\
  60 \email{\{raphael.couturier,david.laiymani,sebastien.miquee\}@univ-fcomte.fr}
  61 \and
  62  FEMTO-ST, ENISYS/IRMA, F-25210 Montb\'{e}liard , FRANCE\\
  63 \email{marc.sauget@univ-fcomte.fr}
  64 }
  65
  66
  67 \maketitle
  68
  69 \begin{abstract}
  70   This paper presents the design and the evaluation of the
  71   gridification of a radiotherapy dose computation application. Due to
  72   the inherent characteristics of the application and its execution,
  73   we choose the architectural context of  volunteer
  74   computing.  For this, we used the XtremWeb-CH
  75   environment. Experiments were conducted on a real volunteer computing
  76   testbed and show good speed-ups and very acceptable platform
  77   overhead, letting XtremWeb-CH be a good candidate for deploying
  78   parallel applications over a volunteer computing environment.
  79 \end{abstract}
  80
  81
  82 %-------------INTRODUCTION--------------------
  83 \section{Introduction}
  84
  85 The use of distributed architectures for solving large scientific
  86 problems seems to become mandatory in a lot of cases. For example, in
  87 the domain of radiotherapy dose computation the problem is
  88 crucial. The main goal of external beam radiotherapy is the treatment
  89 of tumors while minimizing exposure to healthy tissue. Dosimetric
  90 planning has to be carried out in order to optimize the dose
  91 distribution within the patient. Thus, to determine the most accurate
  92 dose distribution during treatment planning, a compromise must be
  93 found between the precision and the speed of calculation. Current
  94 techniques, using analytic methods, models and databases, are rapid
  95 but lack precision. Enhanced precision can be achieved by using
  96 calculation codes based, for example, on the Monte Carlo methods. The main
  97 drawback of these methods is their computation times which can
  98 rapidly become huge. In \cite{NIMB2008} the authors proposed a new approach, called
  99 Neurad, using neural networks. This approach is based on the
 100 collaboration of computation codes and multi-layer neural networks
 101 used as universal approximators. It provides a fast and accurate
 102 evaluation of radiation doses in any given environment for given
 103 irradiation parameters. As the learning step is often very time
 104 consuming, in \cite{AES2009} the authors proposed a parallel
 105 algorithm that enables to decompose the learning domain into
 106 subdomains. The decomposition has the advantage of significantly
 107 reducing the complexity of the target functions to approximate.
 108
 109 Now, as there exist several classes of distributed/parallel
 110 architectures (supercomputers, clusters, global computing\dots{}) we
 111 have to choose the best suited one for the parallel Neurad
 112 application. The volunteer (or global) computing model seems to be an
 113 interesting approach. Here, the computing power is obtained by
 114 aggregating unused (or volunteer) public resources connected to the
 115 Internet. In our case, we can imagine, for example, that a part of the
 116 architecture will be composed of some of the different computers of
 117 the hospital. This approach presents the advantage of being clearly
 118 cheaper than a more dedicated approach like the use of supercomputers
 119 or clusters. Furthermore and as we will see in the remainder, the
 120 studied parallel algorithm corresponds very well to this computation model.
 121
 122 The aim of this paper is to propose and evaluate a gridification of
 123 the Neurad application (more precisely, of the most time consuming
 124 part, the learning step) using a volunteer computing approach. For
 125 this, we focus on the XtremWeb-CH environment\cite{xwch}. We chose
 126 this environment because it tackles the centralized aspect of other
 127 global computing environments such as XtremWeb\cite{xtremweb} or
 128 Seti\cite{seti}. It tends to a peer-to-peer approach by distributing
 129 some components of the architecture. For instance, the computing nodes
 130 are allowed to directly communicate. Experiments were conducted on a
 131 real global computing testbed. The results are very encouraging. They
 132 exhibit an interesting speed-up and show that the overhead induced by
 133 the use of XtremWeb-CH is very acceptable.
 134
 135 The paper is organized as follows. In Section 2 we present the Neurad
 136 application and particularly its most time consuming part, i.e. the
 137 learning step. Section 3 details the XtremWeb-CH environment and
 138 Section 4 exposes the gridification of the Neurad
 139 application. Experimental results are presented in Section 5 and we
 140 end in Section 6 by some concluding remarks and perspectives.
 141
 142 \section{The Neurad application}
 143
 144 \begin{figure}[http]
 145   \centering
 146   \includegraphics[width=0.7\columnwidth]{figures/neurad.pdf}
 147   \caption{The Neurad project}
 148   \label{f_neurad}
 149 \end{figure}
 150
 151 The \emph{Neurad}~\cite{Neurad} project presented in this paper takes place in a
 152 multi-disciplinary project, involving medical physicists and computer scientists
 153 whose goal is to enhance the  treatment planning of cancerous tumors by external
 154 radiotherapy.   In our  previous works~\cite{RADIO09,ICANN10,NIMB2008},  we have
 155 proposed  an  original approach  to  solving  scientific  problems whose  accurate
 156 modeling and/or  analytical description are  difficult. That method is  based on
 157 the collaboration of  computational codes and neural networks  used as universal
 158 interpolator. Thanks to that method,  the \emph{Neurad} software provides a fast
 159 and accurate  evaluation of radiation  doses in any given  environment (possibly
 160 inhomogeneous) for  given irradiation  parameters. We have  shown in  a previous
 161 work (\cite{AES2009}) the interest of using a distributed algorithm for the neural
 162 network learning. We  use a classical RPROP~\footnote{Resilient backpropagation}
 163 algorithm with a  HPU~\footnote{High order processing units} topology  to do the
 164 training of our neural network.
 165
 166 Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts are clearly
 167 independent:  the initial  data production,  the learning  process and  the dose
 168 deposit  evaluation. The  first step,  the data  production, is  outside  of the
 169 {\it{Neurad}}  project.  They  are  many  solutions to  obtain  data  about  the
 170 radiotherapy treatments like  the measure or the simulation.  The only essential
 171 criterion is that the result must be obtained in an homogeneous environment.
 172
 173 % We have chosen to
 174 % use only a Monte Carlo simulation because this kind of tool is the
 175 % reference in the radiotherapy domains. The advantages to use data
 176 % obtained with a Monte Carlo simulator are the following: accuracy,
 177 % profusion, quantified error and regularity of measure points. But,
 178 % there exist also some disagreements and the most important is the
 179 % statistical noise, forcing a data post treatment. Figure~\ref{f_tray}
 180 % presents the general behavior of a dose deposit in water.
 181
 182
 183 % \begin{figure}[http]
 184 %   \centering
 185 %   \includegraphics[width=0.7\columnwidth]{figures/testC.pdf}
 186 %   \caption{Dose deposit by a photon beam  of 24 mm of width in water (normalized value).}
 187 %   \label{f_tray}
 188 % \end{figure}
 189
 190 The secondary stage  of the {\it{Neurad}} project is the  learning step and this
 191 is  the most time  consuming step.  This step  is performed  offline but  it is
 192 important to  reduce the time used for  the learning process to  keep a workable
 193 tool. Indeed, if the learning time is  too huge (for the moment, this time could
 194 reach one week for a limited domain), this process should not be launched at any
 195 time,  but only  when a  major modification  occurs in  the environment,  like a
 196 change  of  context for  instance.  However, it  is  interesting  to update  the
 197 knowledge of the neural network, by  using the learning process, when the domain
 198 evolves (evolution in material used for  the prosthesis or evolution on the beam
 199 (size, shape or energy)). The learning time is related to the volume of data which could be  very important in  a real  medical context.  Some work has been  done to
 200 reduce this  learning time with the  parallelization of the  learning process by
 201 using a partitioning method of the global dataset. The goal of this method is to
 202 train  many neural networks  on sub-domains  of the  global dataset.  After this
 203 training,  the use  of these  neural networks  all together  allows to  obtain a
 204 response for the global domain of study.
 205
 206
 207 \begin{figure}[h]
 208   \centering
 209   \includegraphics[width=0.5\columnwidth]{figures/overlap.pdf}
 210   \caption{Overlapping for a sub-network  in a two-dimensional domain with ratio
 211     $\alpha$}
 212   \label{fig:overlap}
 213 \end{figure}
 214
 215 % j'ai relu mais pas vu le probleme
 216
 217 However, performing the learning on  sub-domains constituting a partition of the
 218 initial domain may  not be satisfying depending on  the chosen quality  of the results. This
 219 comes from the fact that the accuracy of the approximation performed by a neural
 220 network is not constant over the learned domain. Thus, it is necessary to use an
 221 overlapping  of   the  sub-domains.  The   overall  principle  is   depicted  in
 222 Figure~\ref{fig:overlap}.  In this  way,  each sub-network  has an  exploitation
 223 domain  smaller than its  training domain  and the  differences observed  at the
 224 borders  are  no  longer  relevant.   Nonetheless,  in  order  to  preserve  the
 225 performance  of the parallel  algorithm, it  is important  to carefully  set the
 226 overlapping  ratio $\alpha$.  It  must both be  large  enough to  avoid the  border's
 227 errors,  and as  small  as  possible to  limit  the size  increase  of the  data
 228 subsets~\cite{AES2009}.
 229
 230
 231
 232 \section{The XtremWeb-CH environment}
 233 \input{xwch.tex}
 234
 235 \section{The Neurad gridification}
 236
 237 \label{sec:neurad_gridif}
 238
 239
 240 As previously exposed, the Neurad application can be divided into
 241 three steps.  The goal of the first step is to decompose the data
 242 representing the dose distribution on an area. This area contains
 243 various parameters, like the nature of the medium and its
 244 density. This part is out of the scope of this paper.
 245 %Multiple ``views'' can be
 246 %superposed in order to obtain a more accurate learning.
 247
 248 The second step of the application, and the most time consuming, is the learning
 249 in  itself.  This  is the  one  which  has  been  parallelized, using  the  XWCH
 250 environment.  As  exposed  in  section   2,  the  parallelization  relies  on  a
 251 partitioning  of the global  dataset. Following  this partitioning  all learning
 252 tasks are  independently executed  in parallel with  their own local  data part,
 253 with no communication, following  the fork/join model. Clearly, this computation
 254 fits well with the model of the chosen middleware.
 255
 256 \begin{figure}[ht]
 257   \centering
 258   \includegraphics[width=8cm]{figures/neurad_gridif}
 259   \caption{The proposed Neurad gridification}
 260   \label{fig:neurad_grid}
 261 \end{figure}
 262
 263
 264 The execution scheme is then the following (see Figure
 265 \ref{fig:neurad_grid}):
 266 \begin{enumerate}
 267 \item We first send the learning application and its data to the
 268   middleware. In a first time, we send the application to data
 269   warehouses (DW), and the create an "application module" on the
 270   coordinator (Coord.) including references retrieved from the
 271   previous sending operation. In a second time, we apply the same
 272   process to application data.
 273 \item When a worker (W) is ready to compute, it requests a task to
 274   execute to the coordinator (Coord.);
 275 \item The coordinator assigns the worker a task. This last one
 276   retrieves the application and its assigned data, by requesting them
 277   to DW with references sent by the coordinator, and so can start the
 278   computation;
 279 \item At the end of the learning process, the worker sends the result
 280   to a warehouse.
 281 \end{enumerate}
 282
 283 The last step of the application is to retrieve these results (some
 284 weighted neural networks) and exploit them through a dose distribution
 285 process. This last step is out of the scope of this paper.
 286
 287
 288
 289 \section{Experimental results}
 290 \label{sec:neurad_xp}
 291
 292 The aim of this section is to describe and analyze the experimental
 293 results we have obtained with the parallel Neurad version previously
 294 described. Our goal was to carry out this application with real input
 295 data and on a real volunteer computing testbed.
 296
 297 \subsubsection{Experimental conditions}
 298 \label{sec:neurad_cond}
 299
 300 The size of the input data is about 2.4Gb. In order to avoid
 301 noises to appear and disturb the learning process, these data can be
 302 divided into, at most, 25 parts. This generates input data parts of
 303 about 15Mb (in a compressed format). The output data, which are
 304 retrieved after the process, are about 30Kb for each part. We used two
 305 distinct deployments of XWCH:
 306 \begin{enumerate}
 307
 308 \item In the first one, called ``distributed XWCH'',
 309   the XWCH coordinator and the warehouses were located in Geneva,
 310   Switzerland while the workers were running in the same local cluster
 311   in Belfort, France.
 312
 313 \item The second deployment, called ``local XWCH'' is a local
 314   deployment where  coordinator, warehouses and workers were, in
 315   the same local cluster, at the same time.
 316
 317 \end{enumerate}
 318 For both deployments, the local cluster is a campus cluster and during
 319 the day these machines were used by students of the Computer Science
 320 Department of the IUT of Belfort.  Unfortunately, the data
 321 decomposition limitation does not allow us to use more than 25
 322 computers (XWCH workers).
 323
 324 In order  to evaluate the overhead  induced by the  use of the platform  we have
 325 furthermore compared  the execution of  the Neurad application with  and without
 326 the XWCH platform. For  the latter case, we want to insist  on the fact that the
 327 testbed consists only in workers deployed  with their respective data by the use
 328 of shell  scripts. No specific middleware was  used and the workers  were in the
 329 same local cluster.
 330
 331 Finally, five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$,
 332 $0.50e^{-1}$, $0.25e^{-1}$, and $1e^{-2}$.
 333
 334
 335 \subsubsection{Results}
 336 \label{sec:neurad_result}
 337
 338
 339 Table \ref{tab:neurad_res} presents the execution times of the Neurad
 340 application on 25 machines with XWCH (local and distributed
 341 deployment) and without XWCH. These results correspond to the measures
 342 of the same steps for both kinds of execution, i.e. the sending of local
 343 data and the executable, the learning process, and retrieving the
 344 results. Results represent the average time of $5$ executions.
 345
 346
 347 \begin{table}[h!]
 348   \renewcommand{\arraystretch}{1.7}
 349   \centering
 350   \begin{tabular}[h!]{|c|c|c|c|c|}
 351     \hline
 352     ~Precision~ & ~1 machine~ & ~Without XWCH~ & ~With XWCH~ & ~With
 353     local XWCH~ \\
 354     \hline
 355      $1e^{-1}$ & 5190 & 558 & 759 & 629\\
 356     $0.75e^{-1}$ & 6307 & 792 & 1298 & 801 \\
 357     $0.50e^{-1}$ & 7487 & 792 & 1010 & 844 \\
 358     $0.25e^{-1}$ & 7787 & 791 & 1000 & 852\\
 359     $1e^{-2}$ & 11030 & 1035 & 1447 & 1108 \\
 360     \hline
 361   \end{tabular}
 362   \vspace{0.3cm}
 363 \caption{Execution time in seconds of the Neurad application, with and without using the XWCH platform}
 364   \label{tab:neurad_res}
 365 \end{table}
 366
 367 %\begin{table}[ht]
 368 %  \centering
 369 %  \begin{tabular}[h]{|c|c|c|}
 370 %    \hline
 371 %    Precision & Without XWCH & With XWCH \\
 372 %    \hline
 373 %    $1e^{-1}$ & $558$s & $759$s\\
 374 %    \hline
 375 %  \end{tabular}
 376 %  \caption{Execution time in seconds of Neurad application, with and without using XtremWeb-CH platform}
 377 %  \label{tab:neurad_res}
 378 %\end{table}
 379
 380
 381 As we can see, in the case of a local deployment the overhead induced
 382 by the use of the XWCH platform is about $7\%$. It is clearly a low
 383 overhead. Now, for the distributed deployment, the overhead is about
 384 $34\%$. Regarding  the benefits of the platform, it is a very
 385 acceptable overhead which can be explained by the following points.
 386
 387 First, we point out that the conditions of executions are not really
 388 identical between, with and without, XWCH contexts. For this last one,
 389 though the same steps were achieved out, all transfer processes are inside a
 390 local cluster with a high bandwidth and a low latency. Whereas when
 391 using XWCH, all transfer processes (between datawarehouses, workers,
 392 and the coordinator) used a wide network area with a smaller
 393 bandwidth. In addition, in the executions without XWCH, all the machines
 394 started immediately the computation, whereas when using the XWCH
 395 platform, a latency is introduced by the fact that a computation
 396 starts on a machine, only when this one requests a task.
 397
 398 This underlines that, unsurprisingly, deploying a local
 399 coordinator and one or more warehouses near a cluster of workers can
 400 enhance computations and platform performances.
 401
 402
 403
 404
 405 \section{Conclusion and future works}
 406
 407 In this paper, we have presented a gridification of a real medical
 408 application, the Neurad application. This radiotherapy application
 409 tries to optimize the irradiated dose distribution within a
 410 patient. Based on a multi-layer neural network, this application
 411 presents a very time consuming step, i.e. the learning step. Due to the
 412 computing characteristics of this step, we have chosen to parallelize it
 413 using the XtremWeb-CH volunteer computing environment. Obtained
 414 experimental results show good speed-ups and underline that overheads
 415 induced by XWCH are very acceptable, letting it be a good candidate
 416 for deploying parallel applications over a volunteer computing environment.
 417
 418 Our future works include the testing of the application on a
 419 larger scale testbed. This implies, the choice of a data input set
 420 allowing a finer decomposition. Unfortunately, this choice of input
 421 data is not trivial and relies on a large number of parameters.
 422
 423 We are also planning to test XWCH with parallel applications where
 424 communication between workers occurs during the execution. In this
 425 way, the use of the asynchronous iteration model \cite{bcl08} may be
 426 an interesting perspective.
 427
 428 %(demander ici des précisions à Marc).
 429 % Si tu veux parler de l'ensembles des paramètres que l'on peut utiliser pour caractériser les conditions d'irradiations
 430 % tu peux parler :
 431 % - caracteristiques du faisceaux d'irradiation (beam size (de quelques mm à plus de 40 cm),  energy, SSD (source surface distance),
 432 % - caractéritiques de  la matière : density
 433
 434
 435
 436 \bibliographystyle{plain}
 437 \bibliography{biblio}
 438
 439
 440
 441 \end{document}