5 <img align=center src="simgrid_logo.png" alt="SimGrid"><br>
9 \section overview Overview
11 SimGrid is a toolkit that provides core functionalities for the simulation
12 of distributed applications in heterogeneous distributed environments.
13 The specific goal of the project is to facilitate research in the area of
14 distributed and parallel application scheduling on distributed computing
15 platforms ranging from simple network of workstations to Computational
18 \section people People
20 The authors of SimGrid are:
22 \author Henri Casanova <casanova@cs.ucsd.edu>
23 \author Arnaud Legrand <arnaud.legrand@imag.fr>
24 \author Martin Quinson <martin.quinson@tuxfamily.org>
26 \section intro Available Softwares
28 The SimGrid toolkit is composed of different modules :
30 \li XBT (eXtensive Bundle of Tools) is a portable library with many
31 convenient portable datastructures (vectors, hashtables, heap,
32 contexts ...). Most other SimGrid modules rely on it.
34 \li SURF provides the core functionnalities to simulate a virtual
35 platform. It is very low-level and is not intended to be used as
36 such but rather to serve as a basis for higher-level simulators
37 (like MSG, GRAS, SMPI, ...). It relies on a fast max min linear
40 \li MSG is a simulator built using the previous modules. It aims at
41 being realistic and is application-oriented. It is the software layer
42 of choice for building simulation with multiple scheduling agents.
44 \li GRAS (<em>not functionnal yet</em>) is an ongoing project to emulate virtual
45 virtual platforms through SURF. As a consequence a code developped using the GRAS
46 framework is able to run as well in the real-world as in the
47 simulator. The resulting code is very portable and highly interoperable while
48 remaining very efficient. Even if you do not plan to run your code for real,
49 you may want to switch to GRAS if you intend to use MSG in a very intensive way
50 (e.g. for simulating a peer-to-peer environment).
52 \li SMPI (<em>not functionnal yet</em>) is an ongoing project to enable MPI code
53 to run on top of a virtual platform through SURF. It follows the same principle as
54 the ones used in GRAS but is specific to MPI applications.
56 Here is a figure the depicts the relation between those different modules.
59 <img align=center src="simgrid_modules.jpg" alt="SimGrid"><br>
63 The section \ref publications contains links to papers that provide
64 additional details on the project as well as validation and
67 The software can be downloaded from <a href="http://gcl.ucsd.edu/SimGrid/dl/">here</a>.
69 \section install Installation
72 \li <tt>./configure</tt>
73 \li <tt>make all install</tt>
75 If you are not familiar with compiling C files under UNIX and using
76 libraries, you will find some more informations in Section \ref
79 \section documentation API Documentation
81 The API of all different modules is described in \ref SimGrid_API.
83 See \ref SimGrid_examples for an introduction on the way to use these modules.
85 \section users_contributers Users / Contributers
87 \subsection contributers Contributers
89 \li Loris Marchal: wrote the new algorithm for simulation TCP
91 \li Julien Lerouge : wrote a XML parser for ENV descriptions and
92 helped for the general design during a 4 month period (march-june 2002)
94 \li Clément Menier and Marc Perache : wrote a first prototype of
95 the MSG interface during a project at ENS-Lyon (jan 2002).
96 \li Dmitrii Zagorodnov : wrote some parts of the first version
99 \subsection mailinglist User Mailing List
101 We have a <a href=https://listes.ens-lyon.fr/wws/info/simgrid2-users> mailing list for
102 SimGrid users</a>.<p>
104 \section publications Publications
106 \subsection simulation About simulation
108 \li <b>Scheduling Distributed Applications: the
109 SimGrid Simulation Framework</b>\n
110 by <em>Henri Casanova and Arnaud Legrand and Loris Marchal</em>\n
111 Proceedings of the third IEEE International Symposium
112 on Cluster Computing and the Grid (CCGrid'03)\n
113 Since the advent of distributed computer systems an active field
114 of research has been the investigation of scheduling strategies
115 for parallel applications. The common approach is to employ
116 scheduling heuristics that approximate an optimal
117 schedule. Unfortunately, it is often impossible to obtain
118 analytical results to compare the efficacy of these heuristics.
119 One possibility is to conducts large numbers of back-to-back
120 experiments on real platforms. While this is possible on
121 tightly-coupled platforms, it is infeasible on modern distributed
122 platforms (i.e. Grids) as it is labor-intensive and does not
123 enable repeatable results. The solution is to resort to
124 simulations. Simulations not only enables repeatable results but
125 also make it possible to explore wide ranges of platform and
126 application scenarios.\n
127 In this paper we present the SimGrid framework which enables the
128 simulation of distributed applications in distributed computing
129 environments for the specific purpose of developing and evaluating
130 scheduling algorithms. This paper focuses on SimGrid v2, which
131 greatly improves on the first version of the software with more
132 realistic network models and topologies. SimGrid v2 also enables
133 the simulation of distributed scheduling agents, which has become
134 critical for current scheduling research in large-scale platforms.
135 After describing and validating these features, we present a case
136 study by which we demonstrate the usefulness of SimGrid for
137 conducting scheduling research.
140 \li <b>A Network Model for Simulation of Grid Application</b>\n
141 by <em>Henri Casanova and Loris Marchal</em>\n
143 In this work we investigate network models that can be
144 potentially employed in the simulation of scheduling algorithms for
145 distributed computing applications. We seek to develop a model of TCP
146 communication which is both high-level and realistic. Previous research
147 works show that accurate and global modeling of wide-area networks, such
148 as the Internet, faces a number of challenging issues. However, some
149 global models of fairness and bandwidth-sharing exist, and can be link
150 withthe behavior of TCP. Using both previous results and simulation (with
151 NS), we attempt to understand the macroscopic behavior of
152 TCP communications. We then propose a global model of the network for the
153 Grid platform. We perform partial validation of this model in
154 simulation. The model leads to an algorithm for computing
155 bandwidth-sharing. This algorithm can then be implemented as part of Grid
156 application simulations. We provide such an implementation for the
157 SimGrid simulation toolkit.\n
158 ftp://ftp.ens-lyon.fr/pub/LIP/Rapports/RR/RR2002/RR2002-40.ps.gz
161 \li <b>MetaSimGrid : Towards realistic scheduling simulation of
162 distributed applications</b>\n
163 by <em>Arnaud Legrand and Julien Lerouge</em>\n
164 Most scheduling problems are already hard on homogeneous
165 platforms, they become quite intractable in an heterogeneous
166 framework such as a metacomputing grid. In the best cases, a
167 guaranteed heuristic can be found, but most of the time, it is
168 not possible. Real experiments or simulations are often
169 involved to test or to compare heuristics. However, on a
170 distributed heterogeneous platform, such experiments are
171 technically difficult to drive, because of the genuine
172 instability of the platform. It is almost impossible to
173 guarantee that a platform which is not dedicated to the
174 experiment, will remain exactly the same between two tests,
175 thereby forbidding any meaningful comparison. Simulations are
176 then used to replace real experiments, so as to ensure the
177 reproducibility of measured data. A key issue is the
178 possibility to run the simulations against a realistic
179 environment. The main idea of trace-based simulation is to
180 record the platform parameters today, and to simulate the
181 algorithms tomorrow, against the recorded data: even though it
182 is not the current load of the platform, it is realistic,
183 because it represents a fair summary of what happened
184 previously. A good example of a trace-based simulation tool is
185 SimGrid, a toolkit providing a set of core abstractions and
186 functionalities that can be used to easily build simulators for
187 specific application domains and/or computing environment
188 topologies. Nevertheless, SimGrid lacks a number of convenient
189 features to craft simulations of a distributed application
190 where scheduling decisions are not taken by a single
191 process. Furthermore, modeling a complex platform by hand is
192 fastidious for a few hosts and is almost impossible for a real
193 grid. This report is a survey on simulation for scheduling
194 evaluation purposes and present MetaSimGrid, a simulator built
196 ftp://ftp.ens-lyon.fr/pub/LIP/Rapports/RR/RR2002/RR2002-28.ps.gz
198 \li <b>SimGrid: A Toolkit for the Simulation of Application
200 by <em>Henri Casanova</em>\n
201 Advances in hardware and software technologies have made it
202 possible to deploy parallel applications over increasingly large
203 sets of distributed resources. Consequently, the study of
204 scheduling algorithms for such applications has been an active area
205 of research. Given the nature of most scheduling problems one must
206 resort to simulation to effectively evaluate and compare their
207 efficacy over a wide range of scenarios. It has thus become
208 necessary to simulate those algorithms for increasingly complex
209 distributed, dynamic, heterogeneous environments. In this paper we
210 present SimGrid, a simulation toolkit for the study of scheduling
211 algorithms for distributed application. This paper gives the main
212 concepts and models behind SimGrid, describes its API and
213 highlights current implementation issues. We also give some
214 experimental results and describe work that builds on SimGrid's
216 http://grail.sdsc.edu/papers/simgrid_ccgrid01.ps.gz
218 \subsection research Papers using SimGrid results
220 \li <b>Optimal algorithms for scheduling divisible workloads on
221 heterogeneous systems</b>\n
222 by <em>Olivier Beaumont and Arnaud Legrand and Yves Robert</em>\n
223 In this paper, we discuss several algorithms for scheduling
224 divisible loads on heterogeneous systems. Our main contributions
225 are (i) new optimality results for single-round algorithms and (ii)
226 the design of an asymptotically optimal multi-round algorithm. This
227 multi-round algorithm automatically performs resource selection, a
228 difficult task that was previously left to the user. Because it is
229 periodic, it is simpler to implement, and more robust to changes in
230 the speeds of processors or communication links. On the theoretical
231 side, to the best of our knowledge, this is the first published
232 result assessing the absolute performance of a multi-round
233 algorithm. On the practical side, extensive simulations reveal
234 that our multi-round algorithm outperforms existing solutions on a
235 large variety of platforms, especially when the
236 communication-to-computation ratio is not very high (the difficult
238 ftp://ftp.ens-lyon.fr/pub/LIP/Rapports/RR/RR2002/RR2002-36.ps.gz
239 \li <b>On-line Parallel Tomography</b>\n
240 by <em>Shava Smallen</em>\n
241 Masters Thesis, UCSD, May 2001
242 \li <b>Applying Scheduling and Tuning to On-line Parallel Tomography </b>\n
243 by <em>Shava Smallen, Henri Casanova, Francine Berman</em>\n
244 in Proceedings of Supercomputing 2001
245 \li <b>Heuristics for Scheduling Parameter Sweep applications in
246 Grid environments</b>\n
247 by <em>Henri Casanova, Arnaud Legrand, Dmitrii Zagorodnov and
248 Francine Berman</em>\n
249 in Proceedings of the 9th Heterogeneous Computing workshop
250 (HCW'2000), pp349-363.