gpc2011.tex

   1 \documentclass{llncs}
   2 %\usepackage{latex8}
   3 %\usepackage{times}
   4 %\documentclass[a4paper,11pt]{article}
   5 %\usepackage{fullpage}
   6 \usepackage[T1]{fontenc}
   7 \usepackage[utf8]{inputenc}
   8 \usepackage{graphicx,subfigure,graphics}
   9 \usepackage{epsfig}
  10 %\usepackage[usenames]{color}
  11 %\usepackage{latexsym,stmaryrd}
  12 %\usepackage{amsfonts,amssymb}
  13 \usepackage{verbatim,theorem,moreverb}
  14 %\usepackage{float,floatflt}
  15 \usepackage{boxedminipage}
  16 \usepackage{url}
  17 %\usepackage{psfig}
  18 \usepackage{amsmath}
  19 \usepackage{amsfonts}
  20 \usepackage{amssymb}
  21 \usepackage{algorithm}
  22 \usepackage{algorithmic}
  23 %\usepackage{floatfig}
  24 %\usepackage{picins}
  25
  26
  27
  28 \def\sfixme#1{\fbox{\textbf{FIXME: }#1}}
  29
  30 \newcommand{\fixme}[1]{%
  31   \begin{center}
  32     \begin{boxedminipage}{.8\linewidth}
  33       \textsl{{\bf #1}}
  34     \end{boxedminipage}
  35   \end{center}
  36 }
  37 \newcommand{\FIXME}[1]{\marginpar[\null\hspace{2cm} FIXME]{FIXME} \fixme{#1}}
  38
  39 %\psfigurepath{.:fig:IMAGES}
  40 \graphicspath{{.}{fig/}{IMAGES/}}
  41
  42 %\initfloatingfigs
  43
  44 \begin{document}
  45
  46 \title{Gridification of a Radiotherapy Dose Computation Application with the XtremWeb-CH Environment}
  47
  48
  49 \author{Nabil Abdennhader\inst{1} \and Mohamed Ben Belgacem{1} \and Raphaël Couturier\inst{2} \and
  50   David Laiymani\inst{2} \and Sébastien  Miquée\inst{2} \and Marko Niinimaki\inst{1} \and Marc Sauget\inst{2}}
  51
  52 \institute{
  53 University of Applied Sciences Western Switzerland, hepia Geneva,
  54 Switzerland \\
  55 \email{nabil.abdennadher@hesge.ch, mohamed.benbelgacem@unige.ch, markopekka.niinimaeki@hesge.ch}
  56 \and
  57 Laboratoire d'Informatique de l'universit\'{e}
  58   de Franche-Comt\'{e} \\
  59   IUT Belfort-Montbéliard, Rue Engel Gros, 90016 Belfort - France \\
  60 \email{raphael.couturier, david.laiymani, sebastien.miquee@univ-fcomte.fr}
  61 \and
  62  FEMTO-ST, ENISYS/IRMA, F-25210 Montb\'{e}liard , FRANCE\\
  63 \email{marc.sauget@femtost.fr}
  64 }
  65
  66
  67 \maketitle
  68
  69 \begin{abstract}
  70   This paper presents the design and the evaluation of the
  71   gridification of a radiotherapy dose computation application. Due to
  72   the inherent characteristics of the application and its execution,
  73   we choose the architectural context of global (or volunteer) computing.
  74   For this, we used the XtremWeb-CH environement. Experiments were
  75   conducted on a real global computing testbed and show good speed-ups
  76   and very acceptable platform overhead.
  77 \end{abstract}
  78
  79
  80 %-------------INTRODUCTION--------------------
  81 \section{Introduction}
  82
  83 The use of distributed architectures for solving large scientific
  84 problems seems to become mandatory in a lot of cases. For example, in
  85 the domain of radiotherapy dose computation the problem is
  86 crucial. The main goal of external beam radiotherapy is the treatment
  87 of tumors while minimizing exposure to healthy tissue. Dosimetric
  88 planning has to be carried out in order to optimize the dose
  89 distribution within the patient. Thus, to determine the most accurate
  90 dose distribution during treatment planning, a compromise must be
  91 found between the precision and the speed of calculation. Current
  92 techniques, using analytic methods, models and databases, are rapid
  93 but lack precision. Enhanced precision can be achieved by using
  94 calculation codes based, for example, on Monte Carlo methods. The main
  95 drawback of these methods is their computation times which can be
  96 rapidly be huge. In [] the authors proposed a novel approach, called
  97 Neurad, using neural networks. This approach is based on the
  98 collaboration of computation codes and multi-layer neural networks
  99 used as universal approximators. It provides a fast and accurate
 100 evaluation of radiation doses in any given environment for given
 101 irradiation parameters. As the learning step is often very time
 102 consuming, in \cite{bcvsv08:ip} the authors proposed a parallel
 103 algorithm that enable to decompose the learning domain into
 104 subdomains. The decomposition has the advantage to significantly
 105 reduce the complexity of the target functions to approximate.
 106
 107 Now, as there exist several classes of distributed/parallel
 108 architectures (supercomputers, clusters, global computing...)  we have
 109 to choose the best suited one for the parallel Neurad application.
 110 The Global or Volunteer Computing model seems to be an interesting
 111 approach. Here, the computing power is obtained by aggregating unused
 112 (or volunteer) public resources connected to the Internet. For our
 113 case, we can imagine for example, that a part of the architecture will
 114 be composed of some of the different computers of the hospital. This
 115 approach present the advantage to be clearly cheaper than a more
 116 dedicated approach like the use of supercomputers or clusters.
 117
 118 The aim of this paper is to propose and evaluate a gridification of
 119 the Neurad application (more precisely, of the most time consuming
 120 part, the learning step) using a Global Computing approach. For this,
 121 we focus on the XtremWeb-CH environment []. We choose this environment
 122 because it tackles the centralized aspect of other global computing
 123 environments such as XtremWeb [] or Seti []. It tends to a
 124 peer-to-peer approach by distributing some components of the
 125 architecture. For instance, the computing nodes are allowed to
 126 directly communicate. Experiments were conducted on a real Global
 127 Computing testbed. The results are very encouraging. They exhibit an
 128 interesting speed-up and show that the overhead induced by the use of
 129 XtremWeb-CH is very acceptable.
 130
 131 The paper is organized as follows. In Section 2 we present the Neurad
 132 application and particularly it most time consuming part i.e. the
 133 learning step. Section 3 details the XtremWeb-CH environment and
 134 Section 4 exposes the gridification of the Neurad
 135 application. Experimental results are presented in Section 5 and we
 136 end in Section 6 by some concluding remarks and perspectives.
 137
 138 \section{The Neurad application}
 139
 140 \begin{figure}[http]
 141   \centering
 142   \includegraphics[width=0.7\columnwidth]{figures/neurad.pdf}
 143   \caption{The Neurad project}
 144   \label{f_neurad}
 145 \end{figure}
 146
 147 The \emph{Neurad}~\cite{Neurad} project presented in this paper takes
 148 place in a multi-disciplinary project, involving medical physicists
 149 and computer scientists whose goal is to enhance the treatment
 150 planning of cancerous tumors by external radiotherapy. In our
 151 previous works~\cite{RADIO09,ICANN10,NIMB2008}, we have proposed an
 152 original approach to solve scientific problems whose accurate modeling
 153 and/or analytical description are difficult. That method is based on
 154 the collaboration of computational codes and neural networks used as
 155 universal interpolator. Thanks to that method, the \emph{Neurad}
 156 software provides a fast and accurate evaluation of radiation doses in
 157 any given environment (possibly inhomogeneous) for given irradiation
 158 parameters. We have shown in a previous work (\cite{AES2009}) the
 159 interest to use a distributed algorithm for the neural network
 160 learning. We use a classical RPROP algorithm with a HPU topology to do
 161 the training of our neural network.
 162
 163 Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts
 164 are clearly independent: the initial data production, the learning
 165 process and the dose deposit evaluation. The first step, the data
 166 production, is outside the {\it{Neurad}} project. They are many
 167 solutions to obtain data about the radiotherapy treatments like the
 168 measure or the simulation. The only essential criterion is that the
 169 result must be obtain in a homogeneous environment.
 170
 171 % We have chosen to
 172 % use only a Monte Carlo simulation because this kind of tool is the
 173 % reference in the radiotherapy domains. The advantages to use data
 174 % obtained with a Monte Carlo simulator are the following: accuracy,
 175 % profusion, quantified error and regularity of measure points. But,
 176 % there exist also some disagreements and the most important is the
 177 % statistical noise, forcing a data post treatment. Figure~\ref{f_tray}
 178 % presents the general behavior of a dose deposit in water.
 179
 180
 181 % \begin{figure}[http]
 182 %   \centering
 183 %   \includegraphics[width=0.7\columnwidth]{figures/testC.pdf}
 184 %   \caption{Dose deposit by a photon beam  of 24 mm of width in water (normalized value).}
 185 %   \label{f_tray}
 186 % \end{figure}
 187
 188 The secondary stage of the {\it{Neurad}} project is the learning step
 189 and this is the most time consuming step. This step is off-line but it
 190 is important to reduce the time used for the learning process to keep
 191 a workable tool. Indeed, if the learning time is too huge (for the
 192 moment, this time could reach one week for a limited domain), this
 193 process should not be launched at any time, but only when a major
 194 modification occurs in the environment, like a change of context for
 195 instance. However, it is interesting to update the knowledge of the
 196 neural network, by using the learning process, when the domain evolves
 197 (evolution in material used for the prosthesis or evolution on the
 198 beam (size, shape or energy)). The learning time is related to the
 199 volume of data who could be very important in a real medical context.
 200 A work has been done to reduce this learning time with the
 201 parallelization of the learning process by using a partitioning method
 202 of the global dataset. The goal of this method is to train many neural
 203 networks on sub-domains of the global dataset. After this training,
 204 the use of these neural networks all together allows to obtain a
 205 response for the global domain of study.
 206
 207
 208 \begin{figure}[h]
 209   \centering
 210   \includegraphics[width=0.5\columnwidth]{figures/overlap.pdf}
 211   \caption{Overlapping for a sub-network  in a two-dimensional domain with ratio
 212     $\alpha$.}
 213   \label{fig:overlap}
 214 \end{figure}
 215
 216
 217 However, performing the learning on sub-domains constituting a
 218 partition of the initial domain is not satisfying according to the
 219 quality of the results. This comes from the fact that the accuracy of
 220 the approximation performed by a neural network is not constant over
 221 the learned domain. Thus, it is necessary to use an overlapping of
 222 the sub-domains. The overall principle is depicted in
 223 Figure~\ref{fig:overlap}. In this way, each sub-network has an
 224 exploitation domain smaller than its training domain and the
 225 differences observed at the borders are no longer relevant.
 226 Nonetheless, in order to preserve the performance of the parallel
 227 algorithm, it is important to carefully set the overlapping ratio
 228 $\alpha$. It must be large enough to avoid the border's errors, and
 229 as small as possible to limit the size increase of the data subsets.
 230
 231
 232
 233 \section{The XtremWeb-CH environment}
 234 \input{xwch.tex}
 235
 236 \section{}
 237
 238 \label{sec:neurad_gridif}
 239
 240
 241 The Neurad application can be divided into three parts. The first one
 242 aims at dividing data representing dose distribution on an area. This
 243 area contains various parameters, like the density of the medium and
 244 its nature. Multiple ``views'' can be superposed in order to obtain a
 245 more accurate learning. The second part of the application is the
 246 learning itself. This is the most time consuming part and therefore
 247 this is the one which has been ported to XWCH. This part fits well
 248 with the model of the middleware -- all learning tasks execute in
 249 parallel independently with their own local data part, with no
 250 communication, following the fork-join model. As described on Figure
 251 \ref{fig:neurad_grid}, we first send the learning application and data
 252 to the middleware (more precisely on warehouses (DW)) and create the
 253 computation module. When a worker (W) is ready to compute, it requests
 254 a task to execute to the coordinator (Coord.). This latter assigns it
 255 a task. The worker retrieves the application and its assigned data,
 256 and can start the computation. At the end of the learning process, it
 257 sends the result, a weighted neural network which will be used in a
 258 dose distribution process, to a warehouse. The last step of the
 259 application is to retrieve these results and exploit them.
 260
 261
 262 \begin{figure}[ht]
 263   \centering
 264   \includegraphics[width=\linewidth]{neurad_gridif}
 265   \caption{Neurad gridification}
 266   \label{fig:neurad_grid}
 267 \end{figure}
 268
 269 \section{Experimental results}
 270
 271 \label{sec:neurad_xp}
 272
 273 \subsubsection{Conditions}
 274 \label{sec:neurad_cond}
 275
 276
 277 The evaluation of the execution of the Neurad application on XWCH was
 278 composed as follows. The size of the input data is about 2.4Gb. This
 279 amount of data can be divided into 25 parts – otherwise, data noise
 280 appears and will disturb the learning. We have used 25 computers (XWCH
 281 workers) to execute this part of the application. This generates input
 282 data parts of about 15Mb (in a compressed format). The output data,
 283 which are retrieved after the process, are about 30Kb for each part. We
 284 used two distincts deployments of XWCH. In the first one, the XWCH
 285 coordinator and the warehouses were situated in Geneva, Switzerland
 286 while the workers were running in the same local cluster in Belfort,
 287 France. The second deployment is a local deployment where both
 288 coordinator, warehouses and workers were in the same local cluster.
 289 During the day these machines were used by students of the Computer
 290 Science Department of the IUT of Belfort.
 291
 292 We have furthermore compared the execution of the Neurad application
 293 with and without the XWCH platform in order to measure the overhead
 294 induced by the use of the platform. By "without XWCH" we mean that the
 295 testbed consists only in workers deployed with their respective data by
 296 the use of shell scripts. No specific middleware was used and the
 297 workers were in the same local cluster.
 298
 299 Five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$, $0.50e^{-1}$, $0.25e^{-1}$ and $1e^{-2}$.
 300
 301
 302 \subsubsection{Results}
 303 \label{sec:neurad_result}
 304
 305
 306 In these experiments, we measured the same steps on both kinds of
 307 executions. The steps consist of sending of local data and the
 308 executable, the learning process, and retrieving the result. Table
 309 \ref{tab:neurad_res} presents the execution times of the Neurad
 310 application on 25 machines with XWCH (local and distributed deployment)
 311 and without XWCH.
 312
 313
 314 \begin{table}[h!]
 315   \centering
 316   \begin{tabular}[h!]{|c|c|c|c|c|}
 317     \hline
 318     Precision & 1 machine & Without XWCH & With XWCH & With local XWCH\\
 319     \hline
 320      $1e^{-1}$ & 5190 & 558 & 759 & 629\\
 321     $0.75e^{-1}$ & 6307 & 792 & 1298 & 801 \\
 322     $0.50e^{-1}$ & 7487 & 792 & 1010 & 844 \\
 323     $0.25e^{-1}$ & 7787 & 791 & 1000 & 852\\
 324     $1e^{-2}$ & 11030 & 1035 & 1447 & 1108 \\
 325     \hline
 326   \end{tabular}
 327 \caption{Execution time in seconds of the Neurad application, with and without using the XWCH platform}
 328   \label{tab:neurad_res}
 329 \end{table}
 330
 331 %\begin{table}[ht]
 332 %  \centering
 333 %  \begin{tabular}[h]{|c|c|c|}
 334 %    \hline
 335 %    Precision & Without XWCH & With XWCH \\
 336 %    \hline
 337 %    $1e^{-1}$ & $558$s & $759$s\\
 338 %    \hline
 339 %  \end{tabular}
 340 %  \caption{Execution time in seconds of Neurad application, with and without using XtremWeb-CH platform}
 341 %  \label{tab:neurad_res}
 342 %\end{table}
 343
 344
 345 These experiments show that the overhead induced by the use of the XWCH
 346 platform is about $34\%$ in the distributed deployment and about $7\%$
 347 in the local deployment. For this last one, the overhead is very acceptable regarding to the benefits of the platform.
 348
 349 Now, in the distributed deployment the overhead is also acceptable and can be explained by
 350 different factors. First, we point out that the conditions of executions
 351 are not really identical between with and without XWCH. For this last
 352 one, though the same steps were done, all transfer processes are inside
 353 a local cluster with a high bandwidth and a low latency. Whereas when
 354 using XWCH, all transfer processes (between datawarehouses, workers, and
 355 the coordinator) used a wide network area with a smaller bandwidth.
 356
 357 In addition, in executions without XWCH, all the machines started
 358 immediately the computation, whereas when using the XWCH platform, a
 359 latency is introduced by the fact that a task starts on a machine, only
 360 when this one requests a task.
 361
 362 These experiments underline that deploying a local coordinator and one
 363 or more warehouses near a cluster of workers can enhance computations
 364 and platform performances. They also show a limited overhead due to the
 365 use of the platform.
 366
 367
 368 \end{document}
 369
 370
 371
 372 \section{Conclusion and future works}
 373
 374
 375
 376 \bibliographystyle{plain}
 377 \bibliography{biblio}
 378
 379
 380
 381 \end{document}