4 %\documentclass[a4paper,11pt]{article}
6 \usepackage[T1]{fontenc}
7 \usepackage[utf8]{inputenc}
8 \usepackage{graphicx,subfigure,graphics}
10 %\usepackage[usenames]{color}
11 %\usepackage{latexsym,stmaryrd}
12 %\usepackage{amsfonts,amssymb}
13 \usepackage{verbatim,theorem,moreverb}
14 %\usepackage{float,floatflt}
15 \usepackage{boxedminipage}
21 \usepackage{algorithm}
22 \usepackage{algorithmic}
23 %\usepackage{floatfig}
28 \def\sfixme#1{\fbox{\textbf{FIXME: }#1}}
30 \newcommand{\fixme}[1]{%
32 \begin{boxedminipage}{.8\linewidth}
37 \newcommand{\FIXME}[1]{\marginpar[\null\hspace{2cm} FIXME]{FIXME} \fixme{#1}}
39 %\psfigurepath{.:fig:IMAGES}
40 \graphicspath{{.}{fig/}{IMAGES/}}
46 \title{Gridification of a Radiotherapy Dose Computation Application with the XtremWeb-CH Environment}
49 \author{Nabil Abdennhader\inst{1} \and Mohamed Ben Belgacem\inst{1} \and Raphaël Couturier\inst{2} \and
50 David Laiymani\inst{2} \and Sébastien Miquée\inst{2} \and Marko Niinimaki\inst{1} \and Marc Sauget\inst{3}}
53 University of Applied Sciences Western Switzerland, hepia Geneva,
55 \email{nabil.abdennadher@hesge.ch,mohamed.benbelgacem@unige.ch,markopekka.niinimaeki@hesge.ch}
57 Laboratoire d'Informatique de l'universit\'{e}
58 de Franche-Comt\'{e} \\
59 IUT Belfort-Montbéliard, Rue Engel Gros, 90016 Belfort - France \\
60 \email{\{raphael.couturier,david.laiymani,sebastien.miquee\}@univ-fcomte.fr}
62 FEMTO-ST, ENISYS/IRMA, F-25210 Montb\'{e}liard , FRANCE\\
63 \email{marc.sauget@univ-fcomte.fr}
70 This paper presents the design and the evaluation of the
71 gridification of a radiotherapy dose computation application. Due to
72 the inherent characteristics of the application and its execution,
73 we choose the architectural context of volunteer
74 computing. For this, we used the XtremWeb-CH
75 environment. Experiments were conducted on a real volunteer computing
76 testbed and show good speed-ups and very acceptable platform
77 overhead, letting XtremWeb-CH be a good candidate for deploying
78 parallel applications over a volunteer computing environment.
82 %-------------INTRODUCTION--------------------
83 \section{Introduction}
85 The use of distributed architectures for solving large scientific
86 problems seems to become mandatory in a lot of cases. For example, in
87 the domain of radiotherapy dose computation the problem is
88 crucial. The main goal of external beam radiotherapy is the treatment
89 of tumors while minimizing exposure to healthy tissue. Dosimetric
90 planning has to be carried out in order to optimize the dose
91 distribution within the patient. Thus, to determine the most accurate
92 dose distribution during treatment planning, a compromise must be
93 found between the precision and the speed of calculation. Current
94 techniques, using analytic methods, models and databases, are rapid
95 but lack precision. Enhanced precision can be achieved by using
96 calculation codes based, for example, on the Monte Carlo methods. The main
97 drawback of these methods is their computation times which can
98 rapidly become huge. In \cite{NIMB2008} the authors proposed a new approach, called
99 Neurad, using neural networks. This approach is based on the
100 collaboration of computation codes and multi-layer neural networks
101 used as universal approximators. It provides a fast and accurate
102 evaluation of radiation doses in any given environment for given
103 irradiation parameters. As the learning step is often very time
104 consuming, in \cite{AES2009} the authors proposed a parallel
105 algorithm that enables to decompose the learning domain into
106 subdomains. The decomposition has the advantage of significantly
107 reducing the complexity of the target functions to approximate.
109 Now, as there exist several classes of distributed/parallel
110 architectures (supercomputers, clusters, global computing\dots{}) we
111 have to choose the best suited one for the parallel Neurad
112 application. The volunteer (or global) computing model seems to be an
113 interesting approach. Here, the computing power is obtained by
114 aggregating unused (or volunteer) public resources connected to the
115 Internet. In our case, we can imagine, for example, that a part of the
116 architecture will be composed of some of the different computers of
117 the hospital. This approach presents the advantage of being clearly
118 cheaper than a more dedicated approach like the use of supercomputers
119 or clusters. Furthermore and as we will see in the remainder, the
120 studied parallel algorithm corresponds very well to this computation model.
122 The aim of this paper is to propose and evaluate a gridification of
123 the Neurad application (more precisely, of the most time consuming
124 part, the learning step) using a volunteer computing approach. For
125 this, we focus on the XtremWeb-CH environment\cite{xwch}. We chose
126 this environment because it tackles the centralized aspect of other
127 global computing environments such as XtremWeb\cite{xtremweb} or
128 Seti\cite{seti}. It tends to a peer-to-peer approach by distributing
129 some components of the architecture. For instance, the computing nodes
130 are allowed to directly communicate. Experiments were conducted on a
131 real global computing testbed. The results are very encouraging. They
132 exhibit an interesting speed-up and show that the overhead induced by
133 the use of XtremWeb-CH is very acceptable.
135 The paper is organized as follows. In Section 2 we present the Neurad
136 application and particularly its most time consuming part, i.e. the
137 learning step. Section 3 details the XtremWeb-CH environment and
138 Section 4 exposes the gridification of the Neurad
139 application. Experimental results are presented in Section 5 and we
140 end in Section 6 by some concluding remarks and perspectives.
142 \section{The Neurad application}
146 \includegraphics[width=0.7\columnwidth]{figures/neurad.pdf}
147 \caption{The Neurad project}
151 The \emph{Neurad}~\cite{Neurad} project presented in this paper takes place in a
152 multi-disciplinary project, involving medical physicists and computer scientists
153 whose goal is to enhance the treatment planning of cancerous tumors by external
154 radiotherapy. In our previous works~\cite{RADIO09,ICANN10,NIMB2008}, we have
155 proposed an original approach to solving scientific problems whose accurate
156 modeling and/or analytical description are difficult. That method is based on
157 the collaboration of computational codes and neural networks used as universal
158 interpolator. Thanks to that method, the \emph{Neurad} software provides a fast
159 and accurate evaluation of radiation doses in any given environment (possibly
160 inhomogeneous) for given irradiation parameters. We have shown in a previous
161 work (\cite{AES2009}) the interest of using a distributed algorithm for the neural
162 network learning. We use a classical RPROP~\footnote{Resilient backpropagation}
163 algorithm with a HPU~\footnote{High order processing units} topology to do the
164 training of our neural network.
166 Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts are clearly
167 independent: the initial data production, the learning process and the dose
168 deposit evaluation. The first step, the data production, is outside of the
169 {\it{Neurad}} project. They are many solutions to obtain data about the
170 radiotherapy treatments like the measure or the simulation. The only essential
171 criterion is that the result must be obtained in an homogeneous environment.
174 % use only a Monte Carlo simulation because this kind of tool is the
175 % reference in the radiotherapy domains. The advantages to use data
176 % obtained with a Monte Carlo simulator are the following: accuracy,
177 % profusion, quantified error and regularity of measure points. But,
178 % there exist also some disagreements and the most important is the
179 % statistical noise, forcing a data post treatment. Figure~\ref{f_tray}
180 % presents the general behavior of a dose deposit in water.
183 % \begin{figure}[http]
185 % \includegraphics[width=0.7\columnwidth]{figures/testC.pdf}
186 % \caption{Dose deposit by a photon beam of 24 mm of width in water (normalized value).}
190 The secondary stage of the {\it{Neurad}} project is the learning step and this
191 is the most time consuming step. This step is performed offline but it is
192 important to reduce the time used for the learning process to keep a workable
193 tool. Indeed, if the learning time is too huge (for the moment, this time could
194 reach one week for a limited domain), this process should not be launched at any
195 time, but only when a major modification occurs in the environment, like a
196 change of context for instance. However, it is interesting to update the
197 knowledge of the neural network, by using the learning process, when the domain
198 evolves (evolution in material used for the prosthesis or evolution on the beam
199 (size, shape or energy)). The learning time is related to the volume of data which could be very important in a real medical context. Some work has been done to
200 reduce this learning time with the parallelization of the learning process by
201 using a partitioning method of the global dataset. The goal of this method is to
202 train many neural networks on sub-domains of the global dataset. After this
203 training, the use of these neural networks all together allows to obtain a
204 response for the global domain of study.
209 \includegraphics[width=0.5\columnwidth]{figures/overlap.pdf}
210 \caption{Overlapping for a sub-network in a two-dimensional domain with ratio
215 % j'ai relu mais pas vu le probleme
217 However, performing the learning on sub-domains constituting a partition of the
218 initial domain may not be satisfying depending on the chosen quality of the results. This
219 comes from the fact that the accuracy of the approximation performed by a neural
220 network is not constant over the learned domain. Thus, it is necessary to use an
221 overlapping of the sub-domains. The overall principle is depicted in
222 Figure~\ref{fig:overlap}. In this way, each sub-network has an exploitation
223 domain smaller than its training domain and the differences observed at the
224 borders are no longer relevant. Nonetheless, in order to preserve the
225 performance of the parallel algorithm, it is important to carefully set the
226 overlapping ratio $\alpha$. It must both be large enough to avoid the border's
227 errors, and as small as possible to limit the size increase of the data
228 subsets~\cite{AES2009}.
232 \section{The XtremWeb-CH environment}
235 \section{The Neurad gridification}
237 \label{sec:neurad_gridif}
240 As previously exposed, the Neurad application can be divided into
241 three steps. The goal of the first step is to decompose the data
242 representing the dose distribution on an area. This area contains
243 various parameters, like the nature of the medium and its
244 density. This part is out of the scope of this paper.
245 %Multiple ``views'' can be
246 %superposed in order to obtain a more accurate learning.
248 The second step of the application, and the most time consuming, is the learning
249 in itself. This is the one which has been parallelized, using the XWCH
250 environment. As exposed in section 2, the parallelization relies on a
251 partitioning of the global dataset. Following this partitioning all learning
252 tasks are independently executed in parallel with their own local data part,
253 with no communication, following the fork/join model. Clearly, this computation
254 fits well with the model of the chosen middleware.
258 \includegraphics[width=8cm]{figures/neurad_gridif}
259 \caption{The proposed Neurad gridification}
260 \label{fig:neurad_grid}
264 The execution scheme is then the following (see Figure
265 \ref{fig:neurad_grid}):
267 \item We first send the learning application and its data to the
268 middleware. In a first time, we send the application to data
269 warehouses (DW), and the create an "application module" on the
270 coordinator (Coord.) including references retrieved from the
271 previous sending operation. In a second time, we apply the same
272 process to application data.
273 \item When a worker (W) is ready to compute, it requests a task to
274 execute to the coordinator (Coord.);
275 \item The coordinator assigns the worker a task. This last one
276 retrieves the application and its assigned data, by requesting them
277 to DW with references sent by the coordinator, and so can start the
279 \item At the end of the learning process, the worker sends the result
283 The last step of the application is to retrieve these results (some
284 weighted neural networks) and exploit them through a dose distribution
285 process. This last step is out of the scope of this paper.
289 \section{Experimental results}
290 \label{sec:neurad_xp}
292 The aim of this section is to describe and analyze the experimental
293 results we have obtained with the parallel Neurad version previously
294 described. Our goal was to carry out this application with real input
295 data and on a real volunteer computing testbed.
297 \subsubsection{Experimental conditions}
298 \label{sec:neurad_cond}
300 The size of the input data is about 2.4Gb. In order to avoid
301 noises to appear and disturb the learning process, these data can be
302 divided into, at most, 25 parts. This generates input data parts of
303 about 15Mb (in a compressed format). The output data, which are
304 retrieved after the process, are about 30Kb for each part. We used two
305 distinct deployments of XWCH:
308 \item In the first one, called ``distributed XWCH'',
309 the XWCH coordinator and the warehouses were located in Geneva,
310 Switzerland while the workers were running in the same local cluster
313 \item The second deployment, called ``local XWCH'' is a local
314 deployment where coordinator, warehouses and workers were, in
315 the same local cluster, at the same time.
318 For both deployments, the local cluster is a campus cluster and during
319 the day these machines were used by students of the Computer Science
320 Department of the IUT of Belfort. Unfortunately, the data
321 decomposition limitation does not allow us to use more than 25
322 computers (XWCH workers).
324 In order to evaluate the overhead induced by the use of the platform we have
325 furthermore compared the execution of the Neurad application with and without
326 the XWCH platform. For the latter case, we want to insist on the fact that the
327 testbed consists only in workers deployed with their respective data by the use
328 of shell scripts. No specific middleware was used and the workers were in the
331 Finally, five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$,
332 $0.50e^{-1}$, $0.25e^{-1}$, and $1e^{-2}$.
335 \subsubsection{Results}
336 \label{sec:neurad_result}
339 Table \ref{tab:neurad_res} presents the execution times of the Neurad
340 application on 25 machines with XWCH (local and distributed
341 deployment) and without XWCH. These results correspond to the measures
342 of the same steps for both kinds of execution, i.e. the sending of local
343 data and the executable, the learning process, and retrieving the
344 results. Results represent the average time of $5$ executions.
348 \renewcommand{\arraystretch}{1.7}
350 \begin{tabular}[h!]{|c|c|c|c|c|}
352 ~Precision~ & ~1 machine~ & ~Without XWCH~ & ~With XWCH~ & ~With
355 $1e^{-1}$ & 5190 & 558 & 759 & 629\\
356 $0.75e^{-1}$ & 6307 & 792 & 1298 & 801 \\
357 $0.50e^{-1}$ & 7487 & 792 & 1010 & 844 \\
358 $0.25e^{-1}$ & 7787 & 791 & 1000 & 852\\
359 $1e^{-2}$ & 11030 & 1035 & 1447 & 1108 \\
363 \caption{Execution time in seconds of the Neurad application, with and without using the XWCH platform}
364 \label{tab:neurad_res}
369 % \begin{tabular}[h]{|c|c|c|}
371 % Precision & Without XWCH & With XWCH \\
373 % $1e^{-1}$ & $558$s & $759$s\\
376 % \caption{Execution time in seconds of Neurad application, with and without using XtremWeb-CH platform}
377 % \label{tab:neurad_res}
381 As we can see, in the case of a local deployment the overhead induced
382 by the use of the XWCH platform is about $7\%$. It is clearly a low
383 overhead. Now, for the distributed deployment, the overhead is about
384 $34\%$. Regarding the benefits of the platform, it is a very
385 acceptable overhead which can be explained by the following points.
387 First, we point out that the conditions of executions are not really
388 identical between, with and without, XWCH contexts. For this last one,
389 though the same steps were achieved out, all transfer processes are inside a
390 local cluster with a high bandwidth and a low latency. Whereas when
391 using XWCH, all transfer processes (between datawarehouses, workers,
392 and the coordinator) used a wide network area with a smaller
393 bandwidth. In addition, in the executions without XWCH, all the machines
394 started immediately the computation, whereas when using the XWCH
395 platform, a latency is introduced by the fact that a computation
396 starts on a machine, only when this one requests a task.
398 This underlines that, unsurprisingly, deploying a local
399 coordinator and one or more warehouses near a cluster of workers can
400 enhance computations and platform performances.
405 \section{Conclusion and future works}
407 In this paper, we have presented a gridification of a real medical
408 application, the Neurad application. This radiotherapy application
409 tries to optimize the irradiated dose distribution within a
410 patient. Based on a multi-layer neural network, this application
411 presents a very time consuming step, i.e. the learning step. Due to the
412 computing characteristics of this step, we have chosen to parallelize it
413 using the XtremWeb-CH volunteer computing environment. Obtained
414 experimental results show good speed-ups and underline that overheads
415 induced by XWCH are very acceptable, letting it be a good candidate
416 for deploying parallel applications over a volunteer computing environment.
418 Our future works include the testing of the application on a
419 larger scale testbed. This implies, the choice of a data input set
420 allowing a finer decomposition. Unfortunately, this choice of input
421 data is not trivial and relies on a large number of parameters.
423 We are also planning to test XWCH with parallel applications where
424 communication between workers occurs during the execution. In this
425 way, the use of the asynchronous iteration model \cite{bcl08} may be
426 an interesting perspective.
428 %(demander ici des précisions à Marc).
429 % Si tu veux parler de l'ensembles des paramètres que l'on peut utiliser pour caractériser les conditions d'irradiations
431 % - caracteristiques du faisceaux d'irradiation (beam size (de quelques mm à plus de 40 cm), energy, SSD (source surface distance),
432 % - caractéritiques de la matière : density
436 \bibliographystyle{plain}
437 \bibliography{biblio}