Several INRIA research projects are developing languages to improve the design
and processing of Web documents, not only in terms of structure, but also in
terms of meaning, in order to create what is known as the semantic Web. Such
work is carried out in the framework of the W3C international consortium.
When the Web got started, the problem of page representation rapidly arose.
A standard on the subject called SGML had been published by the ISO (International
Standardization Organisation) in 1986. This was a first step toward a structured
approach to documents, a logical, explicit organization into chapters, sections,
subsections, and so on. Such an organization makes it possible to process the
document's contents, for example to search by keyword, and to more easily navigate
it.
Researchers of the WAM team (formerly known as OPERA) of INRIA Rhône Alpes
had been working on such logical document representations for about ten years.
They thus naturally got involved in the work concerning the Web page representation
format. In order to experiment on these formats, they had developed a prototype
software. “When the W3C (World Wide Web Consortium, in charge of developing
Web standards) was set up in 1994,” recalls Vincent Quint, Head of project
WAM, “it got immediately interested in our software. The W3C needed such
a tool to validate, experiment and demonstrate new Web technology, especially
from the point of view of the users who either produce or make use of documents.
Still up to date software
Two team members thus joined the W3C technical team (see box “INRIA, a
major W3C player”) to pursue the development of the software, Amaya. Amaya
thus progressively integrated the new languages created by the W3C. As time
went by, it became an operational authoring tool that not only makes it possible
to use the most recent Web technologies, but also to produce complex Web pages
containing text, graphics and mathematical expressions, that conform to W3C
specifications. A brand new version of this Web editor was released at the end
of 2004.
INRIA also widely contributed to developing some of the new formalisms, such
as MathML, which is the W3C standard to represent mathematics in XML documents.
XML is the new Web page representation format, the successor of HTML. MathML
makes it possible for teachers, students, researchers and engineers to put math
on their Web pages and to exchange it by email or from one software to another.
MathML is the result of a work group created in 1997 by the W3C. Researchers
from project CAFE (formerly SAFIR of Sophia Antipolis) participated in it from
the start. In the 1990s, they had already developed a standard in this field
called OpenMath based on SGML, in collaboration with other research institutions.
Actually, MathML makes it possible to use OpenMath to describe mathematical
objects that are more complex than those natively represented in MathML.
The first MathML recommendation was issued in 1999. The second version dates
from 2003. A certain number of documents are already using MathML, including
the American patents of the US Patents and Trademarks Office. Large scientific
publishing houses such as Elsevier and Springer, as well as online education
publishers, are expressing interest and will use it as soon as they start encoding
their documents in XML.
The multimedia puzzle
Another concern rapidly assumed considerable importance on the Web—the
development of multimedia, either static as with images and text, or dynamic
with sound and video. Such documents had to be guaranteed to remain usable in
such heterogeneous environments as a computer, a cell phone and a TV set. Several
W3C work groups were set up to devise solutions. Nabil Layaïda, a project
WAM researcher, has been working for several years in one of these groups called
“Synchronous multimedia”. The group was created at the beginning
of 1997 with the goal of adapting multimedia documents to the Web, defining
temporal relations between the different information elements in a document
and planning how sound, images and video will fit together within the space
of the screen as well as in time. This work group develops a language called
SMIL, to which INRIA researchers greatly contributed, especially through concepts
developed by Nabil Layaïda during his doctoral thesis. The first SMIL version
was standardized in June 1998, the second one in August 2001. Version 2.1 is
done since may 2005. The format is already widely used, for example by Realplayer
and by MMS multimedia messages that succeed SMS messages in a version adapted
to cell phones.
Nabil Layaïda also coordinated the development of software implementing
SMIL. One of these Web tools called Limsee has been available since the summer
of 2004. It makes it possible to create adaptable multimedia presentations in
the SMIL format. Another one called PocketSMIL is dedicated to PDAs and portable
devices.
In the same spirit, the W3C created another work group called “Device
independence” the goal of which is to make sure that the Web remains independent
from the devices used to access it. A doctoral candidate of project WAM, Tayeb
Lemlouma, has participated in this group until 2004. The research concern for
example the transformation and adaptation of a multimedia document including
video, sound and text to a cell phone. The solutions consist in replacing the
video by still images, or to restructure the documents to display them sequentially
When computers start reasoning...
Nonetheless, beyond such needs for document structuring and data processing,
we still must face the barrage of information coming from the Web. One of the
solutions prepared by the W3C since the end of the 1990s, intends to make document
contents more intelligible, to give meaning to the information stored on Web
pages in HTML. This is what is called the semantic Web. XML standardization
then is a first step: it defines the document and data structure syntax. To
access the meaning, semantic Web languages then make it possible to organize
and prioritize the concepts used to describe Web resources into ontologies.
Ontologies are logical structures that capture a certain number of logical relations
between the concepts. Such languages are for example capable of deducing from
the fact that “you need a ticket to take a train” and the fact that
“the TGV is a train”, that then “you need a ticket to take
the TGV”. Ontologies organize the description of concepts, such as “ticket”
or “train”. From then on, information search can be carried out
intelligently by the computer itself, automatically and without user intervention,
simultaneously on several sites and independently of data format. For example,
a computer can plan a trip for a given destination involving planes, trains
and hotels. In fact, the semantic Web makes it possible for computers to reason,
associate neighboring concepts together for a given request, all things that
are impossible with today's search engines. The answers will be more precise,
more relevant and the information retrieved will be correct.
The first semantic Web language called RDF (Resource Description Framework)
was standardized by the W3C in 1999 and 2004, followed by RDF Schema standardized
in 2004. RDF allows simple semantic descriptions, and RDF Schema supplies a
basic vocabulary to describe the meaning of the concepts used. Two INRIA projects
are particularly involved in semantic Web work, EXMO in Grenoble and ACACIA
at Sophia Antipolis.
Since 2001, EXMO researchers who had been working on the design of knowledge
representation languages, naturally contributed to the “WebOnt”
W3C work group dedicated to developing a third semantic language that is more
expressive than RDF Schema. This language is called OWL and was standardized
in February 2004. It is the first language capable of defining ontologies. Several
software packages using OWL are under development; Operational systems are being
produced by Hewlett Packard and the Universities of Manchester and Karlsruhe.
From the first applications to tomorrow's search engines
These languages, especially RDF and RDF Schema, are beginning to be used. The
main application in the world is called FOAF (Friend Of A Friend). It was created
to connect people, create acquaintance networks and partnerships. Everyone describes
his or her profile (name, email, interest, profession, etc.) and the computer
does the rest. The semantic Web is also of interest to companies to manage their
knowledge.
To ensure the development of RDF and OWL, the W3C launched a work group to define
the “Semantic Web Best Practices” in 2004. The goal of the group
is to define the languages, to offer methodological elements, answer utilization
questions and provide pedagogical material. Fabien Gandon of project ACACIA
participates in the work group. Project ACACIA is interested in methods and
tools for knowledge management. Since their goal is to ensure the interoperability
between different solutions, their work is also in the context of the semantic
Web. A platform called CORESE has been developed since 1999 that makes it possible
to design servers dedicated to the semantic Web. These servers are based on
a search engine that exploits descriptions of the semantic contents of documents.
CORESE implements a translator to read and produce RDF descriptions by interpreting
them in the conceptual graph formalism, a method to represent knowledge and
reasoning that benefits from twenty years of research. The CORESE platform is
available on the Web.
Finally, for the semantic Websemantic Web to really be operational, especially for search
engines, request languages for RDF and OWL must also be designed, in order for
example to simultaneously exploit two different ontologies and make sure they
are interoperable. A W3C work group called “RDF Data Access Group”
is devoted to the problem. Researchers from project EXMO belong to this group.
It is in this spirit that Olivier Corby of project ACACIA is evaluating the
performance of the request language he designed for CORESE.
INRIA, major W3C player
INRIA has been one of the pillars of the W3C (World wide web consortium), an
international consortium that ensures the development and promotion of Web standards.
The institute was the first European host site, from 1995 to 2002, along with
MIT (Massachusetts institute of technology) for the American continent and the
University of Keio in Japan for Asia. Since 2003, ERCIM (European Research Group
for Computer Science and Mathematics) has taken the relay from INRIA in Europe.
During these 8 years, 20 people from INRIA participated in the W3C technical
team (which counted some 60 members in all). Jean-François Abramatic
moreover presided over the consortium for 4 years until 2001. Vincent Quint
was responsible for one of the four W3C technical fields, that is, the format
of documents used on the Web and user interfaces. "This was above all a
guiding role, he explained. This involves keeping an open mind to needs, coordinating
efforts, suggesting the creation of working groups and ensuring the participation
of researchers and industrial companies."
Currently, 6 or 7 INRIA researchers participate in working groups. Vincent
Quint, director of research for INRIA, has been co-chair of the "Technical
Architecture Group" (TAG) for W3C since February 1, 2005.