|
| Summary document | |
Tasks that require huge computations and process colossal quantities of data are now numerous and diverse. Such is the case of meteorological or climate prediction, computing the aerodynamic behavior of a new model of aircraft, deciphering the genome of a living organism or detecting the elementary particles produced by an accelerator, to name but a few. These tasks are also becoming increasingly ambitious, and thus more and more demanding in terms of computing power, data flow and memory capacity. How can computer infrastructure meet these continuously growing needs ?
Globalization and dematerialization of computer resources
The performance of the hardware and software available in each computing
center or to each individual user is rising very sharply. This trend is not
however sufficient to meet the many challenges that face science, technology
and industry. The computing power in hardware doubles every 18 months or so,
on the average, whereas storage capacity doubles every 12 months and the performance
of network connections doubles every 9 months. Thus, the performance of computers
improves less rapidly than that of networks. Therefore, a potentially revolutionary
concept has been developing for six to seven years. The idea is to link geographically
distant equipment together, especially via the Internet, to constitute a network
that combines the computing power, storage capabilities and so forth of all
its members. Each of these members will thus be able to use the sum of available
resources in terms of computing power, memory, software and data, put in by
all the other members of the network. This is the basic idea behind computing
grids. It means that computer resources are simultaneously globalized and
dematerialized.
Why the term grid ? Because in a way, the concept consists in distributing
computer resources the same way as electricity is distributed to homes. Electricity
is supplied to consumers without them having to think about where and by whom
it was produced. So in relation to power grids, researchers started talking
about computing grids.
Overcoming technical and sociological obstacles
The idea of connecting and sharing distributed computer resources was already there in the 60s. However, it is only recently that technological advances have made these prospects relatively concrete. Computer grids come up against many difficulties. First of all there are technical difficulties since the point is to make distant devices that differ in terms of functioning and performance, communicate and work together. It is also to write software that efficiently manages and distributes the network cumulated resources, and to devise programming tools that are adapted to the diffuse and parallel character of the tasks entrusted to the grid, among others. There are also sociological, and even economic and political difficulties, since setting up a grid assumes that separate entities-public institutions, private enterprises, and individuals-can be convinced to put their own resources at the disposal of a collective entity. This aspect has technical repercussions. For example, each one of the nodes of the grid may possess data or software that are deemed to be confidential and not to be communicated to anyone else. Hence the need to guarantee the security of exchanges within the grid through appropriate techniques.
Actual computing grids that make their different nodes cooperate transparently,
easily and on an equal basis, currently only exist as prototypes and are still
far from being used in production. Quite a few experimental initiatives and
research have already been launched. In the United States, the Globus system
is linking American supercomputers together to obtain a parallel, virtual
hypercomputer that should make it possible for each center to submit jobs
using the combined power of all other centers. Grid infrastructures are in
the process of being built by several scientific communities. Such is the
case for particle physics in Europe in the framework of the European Data
Grid project, that must prepare to store and analyze the many petabytes of
data (1 petabyte = 1015 bytes, the equivalent of some 100 billion book pages)
that will be produced every year by the Cern LHC starting in 2007.
In addition, networks devoted to so-called global or peer-to-peer computing
on popular problems have appeared. One of these networks is the SETI@home
initiative, to look for possible clues for extraterrestrial civilizations
among the signals received by the Aricebo radiotelescope in Puerto Rico. The
project is currently distributing the signal analysis work to over half a
million volunteers who are willing to let their personal computer work for
SETI@home when it is otherwise idle. Other similar examples where rather simple
but extremely large computations are distributed over a large number of volunteer
personal computers, like the Great Internet Mersenne Prime Search, looking
for very large prime numbers, or the Decrypthon in France that gathered approximately
75,000 Internet users for a few months, up to May 2002, to compare the sequences
of the 500,000 proteins known in living organisms.
Such distributed networks do not constitute grids strictly speaking. In effect,
the computations are dispatched by a central authority and participating computers
merely carry them out without having the possibility of using the network
for their own needs. They nonetheless illustrate the large computing power
and the benefits that can be expected from grids. Performance of several dozen
teraflops (1 teraflop = 1012 floating point operations per second) were obtained
in these initiatives, something that was not imaginable just ten years ago.
Designing and standardizing grid infrastructure requires a lot of effort on the part of the world computing community. In France, an important part of computing grid research is done at INRIA. At least five of the Institute's teams (projects APACHE, OASIS, PARIS, ReMap and RESO) are deeply involved, in collaboration with various academic or industry partners. In addition to this, many other INRIA teams are doing work that is more or less closely related to the grid topic, either in the framework of Institute projects, in that of the incitative concerted initiative GRID (Globalization of computer resources and data) launched in 2001 by the Ministry of Research, in that of the National Network of Research in Telecommunications (RNRT, set up in 1997) or in that of the National Network for Research and Innovation in Software Technology (RNTL, set up in 1999).