gpc2011.tex

   1 \documentclass{llncs}
   2 %\usepackage{latex8}
   3 %\usepackage{times}
   4 %\documentclass[a4paper,11pt]{article}
   5 %\usepackage{fullpage}
   6 \usepackage[T1]{fontenc}
   7 \usepackage[utf8]{inputenc}
   8 \usepackage{graphicx,subfigure,graphics}
   9 \usepackage{epsfig}
  10 %\usepackage[usenames]{color}
  11 %\usepackage{latexsym,stmaryrd}
  12 %\usepackage{amsfonts,amssymb}
  13 \usepackage{verbatim,theorem,moreverb}
  14 %\usepackage{float,floatflt}
  15 \usepackage{boxedminipage}
  16 \usepackage{url}
  17 %\usepackage{psfig}
  18 \usepackage{amsmath}
  19 \usepackage{amsfonts}
  20 \usepackage{amssymb}
  21 \usepackage{algorithm}
  22 \usepackage{algorithmic}
  23 %\usepackage{floatfig}
  24 %\usepackage{picins}
  25
  26
  27
  28 \def\sfixme#1{\fbox{\textbf{FIXME: }#1}}
  29
  30 \newcommand{\fixme}[1]{%
  31   \begin{center}
  32     \begin{boxedminipage}{.8\linewidth}
  33       \textsl{{\bf #1}}
  34     \end{boxedminipage}
  35   \end{center}
  36 }
  37 \newcommand{\FIXME}[1]{\marginpar[\null\hspace{2cm} FIXME]{FIXME} \fixme{#1}}
  38
  39 %\psfigurepath{.:fig:IMAGES}
  40 \graphicspath{{.}{fig/}{IMAGES/}}
  41
  42 %\initfloatingfigs
  43
  44 \begin{document}
  45
  46 \title{Gridification of a Radiotherapy Dose Computation Application with the XtremWeb-CH Environment}
  47
  48
  49 \author{Nabil Abdennhader\inst{1} \and Mohamed Ben Belgacem{1} \and Raphaël Couturier\inst{2} \and
  50   David Laiymani\inst{2} \and Sébastien  Miquée\inst{2} \and Marko Niinimaki\inst{1} \and Marc Sauget\inst{2}}
  51
  52 \institute{
  53 University of Applied Sciences Western Switzerland, hepia Geneva,
  54 Switzerland \\
  55 \email{nabil.abdennadher@hesge.ch, mohamed.benbelgacem@unige.ch, markopekka.niinimaeki@hesge.ch}
  56 \and
  57 Laboratoire d'Informatique de l'universit\'{e}
  58   de Franche-Comt\'{e} \\
  59   IUT Belfort-Montbéliard, Rue Engel Gros, 90016 Belfort - France \\
  60 \email{raphael.couturier, david.laiymani, sebastien.miquee@univ-fcomte.fr}
  61 \and
  62  FEMTO-ST, ENISYS/IRMA, F-25210 Montb\'{e}liard , FRANCE\\
  63 \email{marc.sauget@femtost.fr}
  64 }
  65
  66
  67 \maketitle
  68
  69 \begin{abstract}
  70   This paper presents the design and the evaluation of the
  71   gridification of a radiotherapy dose computation application. Due to
  72   the inherent characteristics of the application and its execution,
  73   we choose the architectural context of global (or volunteer) computing.
  74   For this, we used the XtremWeb-CH environement. Experiments were
  75   conducted on a real global computing testbed and show good speed-ups
  76   and very acceptable platform overhead letting XtremWeb-CH a good candidate
  77 for deploying parallel applications over a global computing environment.
  78 \end{abstract}
  79
  80
  81 %-------------INTRODUCTION--------------------
  82 \section{Introduction}
  83
  84 The use of distributed architectures for solving large scientific
  85 problems seems to become mandatory in a lot of cases. For example, in
  86 the domain of radiotherapy dose computation the problem is
  87 crucial. The main goal of external beam radiotherapy is the treatment
  88 of tumors while minimizing exposure to healthy tissue. Dosimetric
  89 planning has to be carried out in order to optimize the dose
  90 distribution within the patient. Thus, to determine the most accurate
  91 dose distribution during treatment planning, a compromise must be
  92 found between the precision and the speed of calculation. Current
  93 techniques, using analytic methods, models and databases, are rapid
  94 but lack precision. Enhanced precision can be achieved by using
  95 calculation codes based, for example, on Monte Carlo methods. The main
  96 drawback of these methods is their computation times which can be
  97 rapidly be huge. In [] the authors proposed a novel approach, called
  98 Neurad, using neural networks. This approach is based on the
  99 collaboration of computation codes and multi-layer neural networks
 100 used as universal approximators. It provides a fast and accurate
 101 evaluation of radiation doses in any given environment for given
 102 irradiation parameters. As the learning step is often very time
 103 consuming, in \cite{bcvsv08:ip} the authors proposed a parallel
 104 algorithm that enable to decompose the learning domain into
 105 subdomains. The decomposition has the advantage to significantly
 106 reduce the complexity of the target functions to approximate.
 107
 108 Now, as there exist several classes of distributed/parallel
 109 architectures (supercomputers, clusters, global computing...)  we have
 110 to choose the best suited one for the parallel Neurad application.
 111 The Global or Volunteer Computing model seems to be an interesting
 112 approach. Here, the computing power is obtained by aggregating unused
 113 (or volunteer) public resources connected to the Internet. For our
 114 case, we can imagine for example, that a part of the architecture will
 115 be composed of some of the different computers of the hospital. This
 116 approach present the advantage to be clearly cheaper than a more
 117 dedicated approach like the use of supercomputers or clusters.
 118
 119 The aim of this paper is to propose and evaluate a gridification of
 120 the Neurad application (more precisely, of the most time consuming
 121 part, the learning step) using a Global Computing approach. For this,
 122 we focus on the XtremWeb-CH environment []. We choose this environment
 123 because it tackles the centralized aspect of other global computing
 124 environments such as XtremWeb [] or Seti []. It tends to a
 125 peer-to-peer approach by distributing some components of the
 126 architecture. For instance, the computing nodes are allowed to
 127 directly communicate. Experiments were conducted on a real Global
 128 Computing testbed. The results are very encouraging. They exhibit an
 129 interesting speed-up and show that the overhead induced by the use of
 130 XtremWeb-CH is very acceptable.
 131
 132 The paper is organized as follows. In Section 2 we present the Neurad
 133 application and particularly it most time consuming part i.e. the
 134 learning step. Section 3 details the XtremWeb-CH environment and
 135 Section 4 exposes the gridification of the Neurad
 136 application. Experimental results are presented in Section 5 and we
 137 end in Section 6 by some concluding remarks and perspectives.
 138
 139 \section{The Neurad application}
 140
 141 \begin{figure}[http]
 142   \centering
 143   \includegraphics[width=0.7\columnwidth]{figures/neurad.pdf}
 144   \caption{The Neurad project}
 145   \label{f_neurad}
 146 \end{figure}
 147
 148 The \emph{Neurad}~\cite{Neurad} project presented in this paper takes
 149 place in a multi-disciplinary project, involving medical physicists
 150 and computer scientists whose goal is to enhance the treatment
 151 planning of cancerous tumors by external radiotherapy. In our
 152 previous works~\cite{RADIO09,ICANN10,NIMB2008}, we have proposed an
 153 original approach to solve scientific problems whose accurate modeling
 154 and/or analytical description are difficult. That method is based on
 155 the collaboration of computational codes and neural networks used as
 156 universal interpolator. Thanks to that method, the \emph{Neurad}
 157 software provides a fast and accurate evaluation of radiation doses in
 158 any given environment (possibly inhomogeneous) for given irradiation
 159 parameters. We have shown in a previous work (\cite{AES2009}) the
 160 interest to use a distributed algorithm for the neural network
 161 learning. We use a classical RPROP algorithm with a HPU topology to do
 162 the training of our neural network.
 163
 164 Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts
 165 are clearly independent: the initial data production, the learning
 166 process and the dose deposit evaluation. The first step, the data
 167 production, is outside the {\it{Neurad}} project. They are many
 168 solutions to obtain data about the radiotherapy treatments like the
 169 measure or the simulation. The only essential criterion is that the
 170 result must be obtain in a homogeneous environment.
 171
 172 % We have chosen to
 173 % use only a Monte Carlo simulation because this kind of tool is the
 174 % reference in the radiotherapy domains. The advantages to use data
 175 % obtained with a Monte Carlo simulator are the following: accuracy,
 176 % profusion, quantified error and regularity of measure points. But,
 177 % there exist also some disagreements and the most important is the
 178 % statistical noise, forcing a data post treatment. Figure~\ref{f_tray}
 179 % presents the general behavior of a dose deposit in water.
 180
 181
 182 % \begin{figure}[http]
 183 %   \centering
 184 %   \includegraphics[width=0.7\columnwidth]{figures/testC.pdf}
 185 %   \caption{Dose deposit by a photon beam  of 24 mm of width in water (normalized value).}
 186 %   \label{f_tray}
 187 % \end{figure}
 188
 189 The secondary stage of the {\it{Neurad}} project is the learning step
 190 and this is the most time consuming step. This step is off-line but it
 191 is important to reduce the time used for the learning process to keep
 192 a workable tool. Indeed, if the learning time is too huge (for the
 193 moment, this time could reach one week for a limited domain), this
 194 process should not be launched at any time, but only when a major
 195 modification occurs in the environment, like a change of context for
 196 instance. However, it is interesting to update the knowledge of the
 197 neural network, by using the learning process, when the domain evolves
 198 (evolution in material used for the prosthesis or evolution on the
 199 beam (size, shape or energy)). The learning time is related to the
 200 volume of data who could be very important in a real medical context.
 201 A work has been done to reduce this learning time with the
 202 parallelization of the learning process by using a partitioning method
 203 of the global dataset. The goal of this method is to train many neural
 204 networks on sub-domains of the global dataset. After this training,
 205 the use of these neural networks all together allows to obtain a
 206 response for the global domain of study.
 207
 208
 209 \begin{figure}[h]
 210   \centering
 211   \includegraphics[width=0.5\columnwidth]{figures/overlap.pdf}
 212   \caption{Overlapping for a sub-network  in a two-dimensional domain with ratio
 213     $\alpha$.}
 214   \label{fig:overlap}
 215 \end{figure}
 216
 217
 218 However, performing the learning on sub-domains constituting a
 219 partition of the initial domain is not satisfying according to the
 220 quality of the results. This comes from the fact that the accuracy of
 221 the approximation performed by a neural network is not constant over
 222 the learned domain. Thus, it is necessary to use an overlapping of
 223 the sub-domains. The overall principle is depicted in
 224 Figure~\ref{fig:overlap}. In this way, each sub-network has an
 225 exploitation domain smaller than its training domain and the
 226 differences observed at the borders are no longer relevant.
 227 Nonetheless, in order to preserve the performance of the parallel
 228 algorithm, it is important to carefully set the overlapping ratio
 229 $\alpha$. It must be large enough to avoid the border's errors, and
 230 as small as possible to limit the size increase of the data subsets
 231 (Qu'en est-il pour nos test ?).
 232
 233
 234
 235 \section{The XtremWeb-CH environment}
 236 \input{xwch.tex}
 237
 238 \section{The Neurad gridification}
 239
 240 \label{sec:neurad_gridif}
 241
 242
 243 As previously exposed, the Neurad application can be divided into
 244 three steps.  The goal of the first step is to decompose the data
 245 representing the dose distribution on an area. This area contains
 246 various parameters, like the nature of the medium and its
 247 density. This part is out of the scope of this paper.
 248 %Multiple ``views'' can be
 249 %superposed in order to obtain a more accurate learning.
 250
 251 The second step of the application, and the most time consuming, is
 252 the learning itself. This is the one which has been parallelized,
 253 using the XWCH environment. As exposed in the section 2, the
 254 parallelization relies on a partitionning of the global
 255 dataset. Following this partitionning all learning tasks execute in
 256 parallel independently with their own local data part, with no
 257 communication, following the fork-join model. Clearly, this
 258 computation fits well with the model of the chosen middleware.
 259
 260 The execution scheme is then the following (see Figure
 261 \ref{fig:neurad_grid}):
 262 \begin{enumerate}
 263 \item we first send the learning application and its data to the
 264   middleware (more precisely on warehouses (DW)) and create the
 265   computation module,
 266 \item when a worker (W) is ready to compute, it requests a task to
 267   execute to the coordinator (Coord.),
 268 \item The coordinator assigns the worker a task. This last one retrieves the
 269 application and its assigned data and so can start the computation.
 270 \item At the end of the learning process, the worker sends the result,, to a warehouse.
 271 \end{enumerate}
 272
 273 The last step of the application is to retrieve these results (some
 274 weighted neural networks) and exploit them through a dose distribution
 275 process.
 276
 277
 278 \begin{figure}[ht]
 279   \centering
 280   \includegraphics[width=8cm]{figures/neurad_gridif}
 281   \caption{The proposed Neurad gridification}
 282   \label{fig:neurad_grid}
 283 \end{figure}
 284
 285 \section{Experimental results}
 286 \label{sec:neurad_xp}
 287
 288 The aim of this section is to describe and analyse the experimental
 289 results we have obtained with the parallel Neurad version previously
 290 described. Our goal was to carry out this application with real input
 291 data and on a real global computing testbed.
 292
 293 \subsubsection{Experimental conditions}
 294 \label{sec:neurad_cond}
 295
 296 The size of the input data is about 2.4Gb. In order to avoid that data
 297 noise appears and disturb the learning process, these data can be
 298 divided into 25 part, at most. This generates input data parts of
 299 about 15Mb (in a compressed format). The output data, which are
 300 retrieved after the process, are about 30Kb for each
 301 part. Unfortunately, the data decomposition limitation does not allow
 302 us to use more than 25 computers (XWCH workers). Nevertheless, we used two
 303 distincts deployments of XWCH:
 304 \begin{enumerate}
 305
 306 \item In the first one, called ``ditributed XWCH'' in the following,
 307   the XWCH coordinator and the warehouses were situated in Geneva,
 308   Switzerland while the workers were running in the same local cluster
 309   in Belfort, France.
 310
 311 \item The second deployment, called ``local XWCH'' is a local
 312   deployment where both coordinator, warehouses and workers were in
 313   the same local cluster.
 314
 315 \end{enumerate}
 316 For the both deployments, during the day these machines were used by
 317 students of the Computer Science Department of the IUT of Belfort.
 318
 319 In order to evaluate the overhead induced by the use of the platform
 320 we have furthermore compared the execution of the Neurad application
 321 with and without the XWCH platform. For the latter case, we mean that the
 322 testbed consists only in workers deployed with their respective data
 323 by the use of shell scripts. No specific middleware was used and the
 324 workers were in the same local cluster.
 325
 326 Finally, five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$,
 327 $0.50e^{-1}$, $0.25e^{-1}$ and $1e^{-2}$.
 328
 329
 330 \subsubsection{Results}
 331 \label{sec:neurad_result}
 332
 333 Table \ref{tab:neurad_res} presents the execution times of the Neurad
 334 application on 25 machines with XWCH (local and distributed
 335 deployment) and without XWCH. These results correspond to the measure
 336 of the same step for both kind of execution i.e. sending of local data and the
 337 executable, the learning process, and retrieving the results. The
 338 results represent the average time of $x$ executions.
 339
 340
 341 \begin{table}[h!]
 342   \centering
 343   \begin{tabular}[h!]{|c|c|c|c|c|}
 344     \hline
 345     Precision & 1 machine & Without XWCH & With XWCH & With local XWCH\\
 346     \hline
 347      $1e^{-1}$ & 5190 & 558 & 759 & 629\\
 348     $0.75e^{-1}$ & 6307 & 792 & 1298 & 801 \\
 349     $0.50e^{-1}$ & 7487 & 792 & 1010 & 844 \\
 350     $0.25e^{-1}$ & 7787 & 791 & 1000 & 852\\
 351     $1e^{-2}$ & 11030 & 1035 & 1447 & 1108 \\
 352     \hline
 353   \end{tabular}
 354 \caption{Execution time in seconds of the Neurad application, with and without using the XWCH platform}
 355   \label{tab:neurad_res}
 356 \end{table}
 357
 358 %\begin{table}[ht]
 359 %  \centering
 360 %  \begin{tabular}[h]{|c|c|c|}
 361 %    \hline
 362 %    Precision & Without XWCH & With XWCH \\
 363 %    \hline
 364 %    $1e^{-1}$ & $558$s & $759$s\\
 365 %    \hline
 366 %  \end{tabular}
 367 %  \caption{Execution time in seconds of Neurad application, with and without using XtremWeb-CH platform}
 368 %  \label{tab:neurad_res}
 369 %\end{table}
 370
 371
 372 As we can see, in the case of a local deployment the overhead induced
 373 by the use of the XWCH platform is about $7\%$. It is clearly a low
 374 overhead. Now, for the distributed deployment, the overhead is about
 375 $34\%$. Regarding to the benefits of the platform, it is a very
 376 acceptable overhead which can be explained by the following points.
 377
 378 First, we point out that the conditions of executions are not really
 379 identical between with and without XWCH contexts. For this last one,
 380 though the same steps were done, all transfer processes are inside a
 381 local cluster with a high bandwidth and a low latency. Whereas when
 382 using XWCH, all transfer processes (between datawarehouses, workers,
 383 and the coordinator) used a wide network area with a smaller
 384 bandwidth.  In addition, in executions without XWCH, all the machines
 385 started immediately the computation, whereas when using the XWCH
 386 platform, a latency is introduced by the fact that a computation
 387 starts on a machine, only when this one requests a task.
 388
 389 This underline that, unsurprisingly, deploying a local
 390 coordinator and one or more warehouses near a cluster of workers can
 391 enhance computations and platform performances.
 392
 393
 394 \section{Conclusion and future works}
 395
 396 In this paper, we have presented a gridification of a real medical
 397 application, the Neurad application. This radiotherapy application
 398 tries to optimize the irradiated dose distribution within a
 399 patient. Based on a multi-layer neural network, this applications
 400 present a very time consuming step i.e. the learning step. Due to the
 401 computing characteristics of this step, we choose to parallelize it
 402 using the XtremWeb-CH global computing environment. Obtained
 403 experimental results show good speed-ups and underline that overheads
 404 induced by XWCH are very acceptable, letting it be a good candidate
 405 for deploying parallel applications over a global computing environment.
 406
 407 Our future works, include the testing of the application on a more
 408 large scale testbed. This implies, the choice of a data input set
 409 allowing a finer decomposition. Unfortunately, this choice of input
 410 data is not trivial and relies on a large number of parameters
 411 (demander ici des précisions à Marc).
 412
 413 \bibliographystyle{plain}
 414 \bibliography{biblio}
 415
 416
 417
 418 \end{document}