|
|
Ensuring efficient communication between software components
Using computing grids, more and more scientists and engineers will be able
to perform complex numerical simulations involving specific codes for several
distinct phenomena that are coupled together. Think about simulating the behavior
of a satellite in space, which must simultaneously take into account kinematic,
thermal, mechanical and optical aspects. Such simulations will have to be
based on parallel technology in order to meet performance requirements and
on distributed technology to meet the needs in computer resources. This assumes
that both parallel and distributed aspects are sufficiently compatible.
The PARIS team (Programming parallel
and distributed systems for large scale numerical simulation) is working in
this direction and is striving to develop a high performance software component
model adapted to grids. The PARIS team is starting from the CORBA component
model and extending it to develop a software component model that integrates
parallel codes. This new system is called GridCCM (Grid CORBA Component Model).
It must in particular guarantee efficient communication between parallel software
components, which was not the case of previously existing models. PARIS thus
designed a communication management framework called PadicoTM.
This framework can be used to make parallel distributed services work without
conflicts without having to modify the applications. PadicoTM has been on
display at Supercomputing 2002. It is intended for code coupling applications
based on the parallel COBRA objects concept. Even though COBRA is generally
regarded as rather slow, PadicoTM allows communication at up to 240 MB/s and
latency times of 20µs. This level of performance is comparable to that
of MPI (Message-Passing Interface, a message exchange programming executable
in parallel programming). Still in the context of code coupling, PARIS is
also studying the problem of data globalization. The problem is to store the
data needed or produced by the simulation in a distributed fashion over the
grid.
Guaranteeing rapid access to data, coupling computing codes
In a grid, one of the bottlenecks can be the speed of access to data files,
whose size will certainly be huge. Part of the work of the APACHE
team (Parallel algorithmics, programming and load sharing) is on this aspect.
APACHE has been interested in parallel architecture programming for several
years. The team is now developing parallelization-thus acceleration-methods
for data access,
through PC clusters. It is also designing methods to couple together computing
codes executed on different nodes of the grid in such a way that the various
tasks carried out by these codes follow on optimally. APACHE has developed
its programming techniques using a 200 i-Vectra PCs cluster supplied by Hewlett-Packard.
This cluster ranked 385th in the TOP500 of the most powerful machines worldwide
in June 2001. On these subjects, Apache has tight collaborations with companies
such as Bull, HP, Mandrake, CS, etc. The team is validating its techniques
on concrete applications such as modeling the crossing of cellular membranes
by proteins, in collaboration with chemists, biologists and mathematicians.
Optimally distributing tasks and data for computation
One of the problems tackled by computer scientists is the design of an appropriate algorithmics for computing on a grid. The various tasks in a given computation must be ordered and the data must be placed in such a way that the computation executes optimally in terms of the hardware configuration and the current state of the grid. This question is part of the ReMaP team concerns (Regularity and Massive Parallelism). The team is looking for heuristic methods for task scheduling and data placement, that are validated via simulation. The simulation is done using the SIMGRID simulator designed and developed by the University of California at San Diego, with the participation of ReMaP. Project ReMaP is also developing software layers to let a client of the grid export certain parts of a computation to be done to other grid nodes or servers. Such software layers must for example choose the most appropriate server at any given instant, i.e., the less loaded, the fastest, or the most adapted server to the task in question. They must also choose the data that must be supplied to the servers. ReMaP relies on CORBA (Common Object Request Broker Architecture), a software system that is now standard to link applications implemented on heterogeneous platforms) to develop their toolbox called DIET (Distributed Interactive Engineering Toolbox). DIET will be demonstrated at Supercomputing 2002 on an Ethernet network connecting 6 laptops. The toolbox has been developed with support from the RNTL and is being tested on various applications (digital terrain models, simulation of electronic circuits) in the framework of project ASP of the GRID concerted initiative.
A Java program library for distributed parallel computing
Part of the activities of project OASIS
(Active Objects, Semantics, Internet and Security) has similar objectives
and is concerned with programming tools for distributed applications, either
on a local Intranet network, on a workstation cluster or on Internet grids.
The team is developing in particular a program library entirely written in
Java for distributed parallel computing in the framework of the ObjectWeb
consortium founded by France Télécom R&D, Bull and INRIA.
This library, called ProActive, can be used to perform mobile computations,
that is to say computations initiated on one machine of the grid and continued
on another machine. It also includes security tools such as data exchange
encryption and user authentification. ProActive has many more attractive features:
dynamic and transparent code loading, online documentation, ease of installation
and use, visualization and graphic control of program execution. ProActive
will be demonstrated at Supercomputing 2002. OASIS is validating its ProActive
development on electromagnetism computations (aircraft radar image computations).
The ProActive library is available on the Internet under LGPL license. It
has already been downloaded by numerous academic and industry users.
The OASIS team very recently succeeded in executing an application to solve
3D Maxwell's equations in electromagnetism on a 64 processor cluster, showing
an practically optimal acceleration. The application was developed in collaboration
with project CAIMAN and entirely
written in ProActive Java. Executing the same application in P2P Intranet
on desktop machines and standard INRIA production network made it possible
to solve a 150 to the cube mesh, that is to say over 100 million facets, on
252 processors.
Adapting communication protocols to heterogeneous infrastructures and very high speed
Network connections, a central element of the future grids, are capable today
of delivering considerable throughputs. The American network TeraGrid that
links the National Science Foundation computing centers reaches 40 gigabits
per second and the European network GEANT that connects the main European
capitals reaches 10 gigabits per second. It is however crucial that such capabilities
be used to their maximum. This entails the design of communication protocols
and performance measurement and prediction tools that are adapted to very
high speeds and heterogeneous infrastructure. This is one of the main tasks
of the RESO
team (Protocols and Software Optimized for Heterogeneous High Speed Networks).
The IP and TCP protocols used on the Internet are already old and are neither
adapted to very high speed nor to the grid concept. RESO is studying the possible
evolution of these protocols. One of the developments proposed by the team
consists in introducing a differentiation of service, that is to say in modulating
the degree of priority of information packets, and optimized transport protocols.
The idea is to allow the very heavy flows attached to a grid to take advantage
of the periods when regular Internet traffic is low, in order to fully exploit
the capabilities offered.
RESO is carrying out this work in collaboration with national projects (the
VTHD network of the RNRT, the e-Toile platform of the RNTL) and the international
projects DataGrid and DataTAG. DataGrid is a European project that aims at
designing a platform and software for a data grid at the service of particle
physics, Earth monitoring and biology. It relies on European networks such
as GEANT or national networks like RENATER. The DataTAG project (Data TransAtlantic
Grid) objective is to interconnect European and American grids via very high
speed links.
I-Cluster 2: a shared experimental platform
INRIA teams carry out their research activities using experimental platforms.
I-Cluster 2, currently being installed in the INRIA Rhône-Alpes research
unit, provides the institute with its most powerful supercomputer yet. Its
architecture is based on Itanium 2 dual processors communicating through a
Myrinet network. A total of 104 dual-processors at 900MHz, 312 Go RAM, are
arranged as 10 racks of 10 nodes and 1 rack of 4 nodes with additional disk
storage. I-cluster 2 is connected to the VTHD network and is running Linux
OS (RedHat Advanced Server). First Linpack experiments at INRIA (Aug. 2003)
have reached a 560 GFlop/s performance.
I-cluster 2 is part of a scientific program financed by the French ministry
of Research and Education, the Rhône-Alpes region, INRIA, Ecole Normale
Supérieure de Lyon, the Institut National Polytechnique of Grenoble
and Joseph Fourier University.
Other INRIA teams are carrying out research that concern data or computing
grids one way or another. Thus, the ARES
team (Architecture for Service Networks) that is working on problems related
to service deployment on radio network infrastructures, is developing a platform
called DARTS
(Deployment and Administration of Resources, Processing and Services). In
the framework of grids, this platform can be used for administering and instrumenting
the computing resources available on different points of the grid. It also
supplies services to simplify the interaction between the various components
administered (asynchronous messaging, naming, application dynamic loading).
The platform also offers facilities for application administration (deployment,
porting).
Researchers in project SARDES (Constructing
Software Infrastructures for Large Scale, Heterogeneous, Distributed Systems)
are studying the architecture and design of distributed software infrastructures
for global information processing environments, by systematically using reflection
and component building techniques (a reflexive system can be defined as a
system offering an explicit, operable and causally connected representation
of itself).
Project CARAVEL (Information Mediation
Systems) is concerned with the problem of integrating information in networks
that contain heterogeneous, autonomous information sources, as is the case
for grids. The question is to offer a uniform mode of access to a set of information
sources through an integrated view, to facilitate the construction and maintenance
of coherent data warehouse and to offer modes of navigation in an information
network that is adapted to different categories of users.
Finally, project ScAlApplix
(High Performance Schemes and Algorithms for Complex Scientific Applications)
brings together several scientific skills for a multidisciplinary study of
high performance computing and its applications to complex scientific computations-chemical
reactions simulations, unsteady fluid flows simulations, host-parasite systems
simulations and so on-that require massive computing power of the order of
the teraflops and very large volumes of data of the order of a terabyte. In
addition to modeling and simulation techniques and high performance algorithms,
project ScAlApplix is also working on the visualization and steering of distributed
numerical simulations using a virtual reality code.