From 9d2bedb765159fa0135d334f703c85fb2384a859 Mon Sep 17 00:00:00 2001 From: David Laiymani Date: Thu, 6 Jan 2011 17:16:10 +0100 Subject: [PATCH] 1er draft complet. A relire of course. Manque la biblio --- gpc2011.tex | 193 +++++++++++++++++++++++++++++++--------------------- 1 file changed, 115 insertions(+), 78 deletions(-) diff --git a/gpc2011.tex b/gpc2011.tex index a29bb6f..eaf7101 100644 --- a/gpc2011.tex +++ b/gpc2011.tex @@ -73,7 +73,8 @@ Laboratoire d'Informatique de l'universit\'{e} we choose the architectural context of global (or volunteer) computing. For this, we used the XtremWeb-CH environement. Experiments were conducted on a real global computing testbed and show good speed-ups - and very acceptable platform overhead. + and very acceptable platform overhead letting XtremWeb-CH a good candidate +for deploying parallel applications over a global computing environment. \end{abstract} @@ -226,89 +227,115 @@ differences observed at the borders are no longer relevant. Nonetheless, in order to preserve the performance of the parallel algorithm, it is important to carefully set the overlapping ratio $\alpha$. It must be large enough to avoid the border's errors, and -as small as possible to limit the size increase of the data subsets. +as small as possible to limit the size increase of the data subsets +(Qu'en est-il pour nos test ?). \section{The XtremWeb-CH environment} \input{xwch.tex} -\section{} +\section{The Neurad gridification} \label{sec:neurad_gridif} -The Neurad application can be divided into three parts. The first one -aims at dividing data representing dose distribution on an area. This -area contains various parameters, like the density of the medium and -its nature. Multiple ``views'' can be superposed in order to obtain a -more accurate learning. The second part of the application is the -learning itself. This is the most time consuming part and therefore -this is the one which has been ported to XWCH. This part fits well -with the model of the middleware -- all learning tasks execute in +As previously exposed, the Neurad application can be divided into +three steps. The goal of the first step is to decompose the data +representing the dose distribution on an area. This area contains +various parameters, like the nature of the medium and its +density. This part is out of the scope of this paper. +%Multiple ``views'' can be +%superposed in order to obtain a more accurate learning. + +The second step of the application, and the most time consuming, is +the learning itself. This is the one which has been parallelized, +using the XWCH environment. As exposed in the section 2, the +parallelization relies on a partitionning of the global +dataset. Following this partitionning all learning tasks execute in parallel independently with their own local data part, with no -communication, following the fork-join model. As described on Figure -\ref{fig:neurad_grid}, we first send the learning application and data -to the middleware (more precisely on warehouses (DW)) and create the -computation module. When a worker (W) is ready to compute, it requests -a task to execute to the coordinator (Coord.). This latter assigns it -a task. The worker retrieves the application and its assigned data, -and can start the computation. At the end of the learning process, it -sends the result, a weighted neural network which will be used in a -dose distribution process, to a warehouse. The last step of the -application is to retrieve these results and exploit them. +communication, following the fork-join model. Clearly, this +computation fits well with the model of the chosen middleware. + +The execution scheme is then the following (see Figure +\ref{fig:neurad_grid}): +\begin{enumerate} +\item we first send the learning application and its data to the + middleware (more precisely on warehouses (DW)) and create the + computation module, +\item when a worker (W) is ready to compute, it requests a task to + execute to the coordinator (Coord.), +\item The coordinator assigns the worker a task. This last one retrieves the +application and its assigned data and so can start the computation. +\item At the end of the learning process, the worker sends the result,, to a warehouse. +\end{enumerate} + +The last step of the application is to retrieve these results (some +weighted neural networks) and exploit them through a dose distribution +process. \begin{figure}[ht] \centering - \includegraphics[width=\linewidth]{neurad_gridif} - \caption{Neurad gridification} + \includegraphics[width=8cm]{figures/neurad_gridif} + \caption{The proposed Neurad gridification} \label{fig:neurad_grid} \end{figure} \section{Experimental results} - \label{sec:neurad_xp} -\subsubsection{Conditions} -\label{sec:neurad_cond} +The aim of this section is to describe and analyse the experimental +results we have obtained with the parallel Neurad version previously +described. Our goal was to carry out this application with real input +data and on a real global computing testbed. +\subsubsection{Experimental conditions} +\label{sec:neurad_cond} -The evaluation of the execution of the Neurad application on XWCH was -composed as follows. The size of the input data is about 2.4Gb. This -amount of data can be divided into 25 parts – otherwise, data noise -appears and will disturb the learning. We have used 25 computers (XWCH -workers) to execute this part of the application. This generates input -data parts of about 15Mb (in a compressed format). The output data, -which are retrieved after the process, are about 30Kb for each part. We -used two distincts deployments of XWCH. In the first one, the XWCH -coordinator and the warehouses were situated in Geneva, Switzerland -while the workers were running in the same local cluster in Belfort, -France. The second deployment is a local deployment where both -coordinator, warehouses and workers were in the same local cluster. -During the day these machines were used by students of the Computer -Science Department of the IUT of Belfort. - -We have furthermore compared the execution of the Neurad application -with and without the XWCH platform in order to measure the overhead -induced by the use of the platform. By "without XWCH" we mean that the -testbed consists only in workers deployed with their respective data by -the use of shell scripts. No specific middleware was used and the +The size of the input data is about 2.4Gb. In order to avoid that data +noise appears and disturb the learning process, these data can be +divided into 25 part, at most. This generates input data parts of +about 15Mb (in a compressed format). The output data, which are +retrieved after the process, are about 30Kb for each +part. Unfortunately, the data decomposition limitation does not allow +us to use more than 25 computers (XWCH workers). Nevertheless, we used two +distincts deployments of XWCH: +\begin{enumerate} + +\item In the first one, called ``ditributed XWCH'' in the following, + the XWCH coordinator and the warehouses were situated in Geneva, + Switzerland while the workers were running in the same local cluster + in Belfort, France. + +\item The second deployment, called ``local XWCH'' is a local + deployment where both coordinator, warehouses and workers were in + the same local cluster. + +\end{enumerate} +For the both deployments, during the day these machines were used by +students of the Computer Science Department of the IUT of Belfort. + +In order to evaluate the overhead induced by the use of the platform +we have furthermore compared the execution of the Neurad application +with and without the XWCH platform. For the latter case, we mean that the +testbed consists only in workers deployed with their respective data +by the use of shell scripts. No specific middleware was used and the workers were in the same local cluster. -Five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$, $0.50e^{-1}$, $0.25e^{-1}$ and $1e^{-2}$. +Finally, five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$, +$0.50e^{-1}$, $0.25e^{-1}$ and $1e^{-2}$. \subsubsection{Results} \label{sec:neurad_result} - -In these experiments, we measured the same steps on both kinds of -executions. The steps consist of sending of local data and the -executable, the learning process, and retrieving the result. Table -\ref{tab:neurad_res} presents the execution times of the Neurad -application on 25 machines with XWCH (local and distributed deployment) -and without XWCH. +Table \ref{tab:neurad_res} presents the execution times of the Neurad +application on 25 machines with XWCH (local and distributed +deployment) and without XWCH. These results correspond to the measure +of the same step for both kind of execution i.e. sending of local data and the +executable, the learning process, and retrieving the results. The +results represent the average time of $x$ executions. \begin{table}[h!] @@ -342,36 +369,46 @@ and without XWCH. %\end{table} -These experiments show that the overhead induced by the use of the XWCH -platform is about $34\%$ in the distributed deployment and about $7\%$ -in the local deployment. For this last one, the overhead is very acceptable regarding to the benefits of the platform. - -Now, in the distributed deployment the overhead is also acceptable and can be explained by -different factors. First, we point out that the conditions of executions -are not really identical between with and without XWCH. For this last -one, though the same steps were done, all transfer processes are inside -a local cluster with a high bandwidth and a low latency. Whereas when -using XWCH, all transfer processes (between datawarehouses, workers, and -the coordinator) used a wide network area with a smaller bandwidth. +As we can see, in the case of a local deployment the overhead induced +by the use of the XWCH platform is about $7\%$. It is clearly a low +overhead. Now, for the distributed deployment, the overhead is about +$34\%$. Regarding to the benefits of the platform, it is a very +acceptable overhead which can be explained by the following points. -In addition, in executions without XWCH, all the machines started -immediately the computation, whereas when using the XWCH platform, a -latency is introduced by the fact that a task starts on a machine, only -when this one requests a task. - -These experiments underline that deploying a local coordinator and one -or more warehouses near a cluster of workers can enhance computations -and platform performances. They also show a limited overhead due to the -use of the platform. - - -\end{document} +First, we point out that the conditions of executions are not really +identical between with and without XWCH contexts. For this last one, +though the same steps were done, all transfer processes are inside a +local cluster with a high bandwidth and a low latency. Whereas when +using XWCH, all transfer processes (between datawarehouses, workers, +and the coordinator) used a wide network area with a smaller +bandwidth. In addition, in executions without XWCH, all the machines +started immediately the computation, whereas when using the XWCH +platform, a latency is introduced by the fact that a computation +starts on a machine, only when this one requests a task. +This underline that, unsurprisingly, deploying a local +coordinator and one or more warehouses near a cluster of workers can +enhance computations and platform performances. \section{Conclusion and future works} - +In this paper, we have presented a gridification of a real medical +application, the Neurad application. This radiotherapy application +tries to optimize the irradiated dose distribution within a +patient. Based on a multi-layer neural network, this applications +present a very time consuming step i.e. the learning step. Due to the +computing characteristics of this step, we choose to parallelize it +using the XtremWeb-CH global computing environment. Obtained +experimental results show good speed-ups and underline that overheads +induced by XWCH are very acceptable, letting it be a good candidate +for deploying parallel applications over a global computing environment. + +Our future works, include the testing of the application on a more +large scale testbed. This implies, the choice of a data input set +allowing a finer decomposition. Unfortunately, this choice of input +data is not trivial and relies on a large number of parameters +(demander ici des précisions à Marc). \bibliographystyle{plain} \bibliography{biblio} -- 2.20.1