logo inria

Information de meme niveau :

| INRIA in brief | History | Strategy | Annual report | Organization chart | Evaluation Committee | The research centres and their partners | The joint INRIA-Microsoft Research Centre |

-----------------------
Everything you have always wanted to know about indicators
-----------------------

Version française Version française

| Evaluation Committee | Analysis document on the bibliometric indicators

Indicators are becoming an unavoidable part of all levels of research operations, from the structures themselves, to their strategic orientations or funding systems. It is not difficult to understand why. However, the influence of some indicators, such as Journal Impact Factors (JIF), has taken on an exaggerated proportion that means more emphasis is placed on its scope than ever before.

Further to a review of literature on this subject and their own tests, members of an INRIA evaluation committee have underlined the weaknesses of this encrypted data that tends to be used rather blindly, without making cross-checks. The committee have thus issued recommendations related to their use. Below you will find the essential points of their findings, presented in the form of questions and answers.

1. What is an indicator?
2. Are the sources always reliable?
3. How should the citations be counted?
4. Do indicators really provide the required information?
5. Has there been any collateral damage?
6. Which precautions should be taken?
7. The full analysis document on the indicators

1. What is an indicator?

Most bibliometric indicators are set up using citation analysis, i.e. examining the final part of a scientific article devoted to references used by the author to write the piece. Gathering citations is a colossal task. It is accomplished using article data bases. To begin with, a query is run to find articles mentioned in the references, thus establishing a citation base. The latter is then used to calculate a number of indicators that are intended to reflect the impact of articles published by a researcher, a laboratory or an institution, and even the quality of the work published. Any indicator based on citations therefore relies on the assumption that the citation is always positive (the work merits praise), but this is clearly not always the case in scientific literature.

2. Are the sources always reliable?

The main suppliers studied in the report issued by the evaluation committee are the commercial company Thomson ISI — the longest running establishment (1960) which publishes its Journal Citation Report (JCR) each year (Science Citation Index is the name of the paper version) based on the Web of Science (WoS) — Scopus, launched by Elsevier in 2004, and some of the largest free sources such as Citeseer, specializing in computer science, Google scholar and Citebase.

The most striking element, unsurprising given the amplitude of the task, is the thoroughly disparate selection of research fields covered by the bases, despite the massive number of articles taken into account:

ISI-WoS covers the "hard" sciences (80%) better than life sciences, and performs better in basic science rather than applied sciences. However, it only analyses 8,700 journals whereas in 1999 the total number of scientific journals published was estimated at 100,000 - with 25,000 in the medical field alone. Important publications in some fields, such as conference proceedings, articles published in open access journals or in open archives or personal webpages, are poorly taken into account, if at all, and the same can be said for books or book chapters. Journals that are covered are mainly Anglo-Saxon: in 2005, 98.74% of articles analysed were in English, 0.234% in French, 0.205 in Chinese and 0.09 in Japanese.
More or less the same coverage weaknesses are found in the other bases: Scopus analyses 15,000 and (including?) 12,850 journals and 125 book series, but with better coverage for engineering and improved geographic distribution (60% originating from places other than the United States). CiteSeer contains 1,200 computer science journals and conference proceedings. Google scholar coverage varies widely according to the field, little information is provided concerning sources and the particularity of Citebase is that it takes account of both the citations and the number of downloads of articles from some open archives.

The coverage issue is important since the base may not take account of a publication that is in fact vital to a speciality field. As a consequence, computer science journals, for example, ranked among the best by the specialized base CiteSeer, are assigned very low rankings in their field by the WoS: the first is ranked 26th and the 4th 122nd. Furthermore, these classifications are markedly different from any rankings performed by experts from the field, as was proven by the scientists from the evaluation committee for one of their speciality fields, robotics.

3. How should the citations be counted?

Counting the number of citations to attribute to an author, a journal or an institution raises many technical problems: an article may have more than one author – the citations may be attributed to a single editor of a book or attributed to each author having contributed to an article – homographs may increase or decrease the number of articles attributed to a person, (for something as banal as having a commonly-used name), the identification of journals is made difficult through variations in abbreviations, without mentioning the different abbreviations that may be employed in an article to designate ‘multiple’ institutions - whereas in fact it is one and the same: for example four INRIA researcher centres tested had up to nine different names (INRIA, Loria, INRIA Rennes - Bretagne Atlantique, etc.) in the WoS.
For free sources, the problem is magnified by automatic data processing: a first name starting with the same letter, accented letters, order of name and first name, etc. are all further sources of error in citation attribution.

The inevitably limited coverage of citation bases and indexing difficulties are clearly evident if we compare results obtained by running the same queries in different bases. There are often massive variations in the number of citations available: for example Scopus provides 35% more citations than WoS and Google Scholar more than 160% for 25 well-respected computer science researchers. The number of citations available also poorly represents the number of articles that are truly cited: by examining his own citations in the Web of Science, Nisonger found that this base contained 28.8% of the total citations of his publications, 42.2% of citations of his journal articles; 20.3% of citations in non-American media and 2.3% of citations in his articles written in a language other than English. These percentages vary to different degrees according to the field of research concerned.

The inconsistencies found in these sources clearly casts doubt on the value of indicators based on them, regardless of their hypothetical quality. Moreover, questions could legitimately be raised with regards to the validity of publishing an indictor with 3 decimals given it is likely to have overlooked 20% of all journal citations processed: this incertitude will have an impact right from the first decimal in the indicator.

4. Do indicators really provide the required information?

Evaluating an article's scientific quality is a delicate problem. A simple approach is to link its quality to that of the medium in which it was published.
In this way, evaluating the medium (a scientific journal in most cases) takes the place of an individual evaluation, which of course simplifies the task enormously since there are far fewer media than articles. But is this relevant?

The Journal impact factor (JIF) indicator invented by the ISI at the beginning of the 1960s to classify journals is still the most frequently used today. A journal's JIF for year n is defined as the ratio between the number of citations during year n of the journal's articles published during years n–1 and n–2, and the total number of articles published during these two years.

The first bias revealed by the report is that journal classifications obtained in this manner do not make it possible to compare different research fields. The imposed limitation of two years favours rapidly evolving fields for which work dating back more than two years is already obsolete. In this way, molecular biology publications have a stronger index ranking than mathematics publications: for example in 2005, the average JIF for 140 mathematics and genetic journals varied by a factor of 10. The article written by Andrew Wiles about the Fermat theory only contained four references out of the 84 publications which dated less than two years. All this without mentioning citation practices that can vary massively from one field to another: for example, in 2000, the average citation rate in pharmacology was 11 whereas in genetics it was just 28. Other additional factors can influence JIF in an arbitrary manner, such as the size of the community or the type of material published and the degree of specialization of the journals (the more general they are the higher their JIF).
Other indices were put forward: immediacy index, calculated over a single year, or the half-life of citations that provide information regarding the continuing existence of research in a given field (number of years j in that 50% of citations in a year n predate the year n-j and 50% are from a later date). These indices are not really independent of the JIF, since publications with a short half-life will automatically have a high JIF. It would be truer to say that they complement it.

Furthermore, the report notes that it is difficult to use the JIF to assess laboratory performance.

For example, the CWTS report from the University of Leiden shows that there is very little correlation between the peer assessment conducted by 42 Dutch computer science laboratories established by the Review Committee for Computer Science de la Quality Assurance Netherlands Universities (QANU) and the indicators.
Nor is it appropriate to assess the value of authors that write in a journal insofar as citations originate from a maximum of 15% of the articles published, even with regards to journals with a high JIF, so the JIF does not really measure the quality of a specific article or author. There are many examples of articles published in a journal with a low JIF, which have, in fact, constituted a major contribution to contemporary science, and conversely, articles of poor quality, or that are purely one-sided, published in journals with high JIFs.
Consequently, there is a current trend (often criticised by professional bibliometricians) toward offering indicators which supposedly better evaluate the scientific quality of an individual's work. The most well-known is the H index proposed by J.E. Hirsch to assess, for example, the value of a researcher and take decisions regarding his/her recruitment. It is the H index of articles by a given author that have been cited at least h times each. The advantage is that this index can be easily obtained by WoS using the time cited factor. However, in the same way as the JIF, the H index varies according to the discipline - it tends to be higher in biology than in physics for example – and is equally difficult to establish since calculations are made using the same bases (homograph problems, etc.) and by providing results that are sorted differently according to the base. Furthermore, some studies have noted that the H index is correlated with the age factor, that it can be significantly increased even if the researcher has not been active for a long period, that it is exaggerated for researchers who have published works and that it does not highlight important contributions made by one particular author.
A certain number of variants of this index have been put forward to correct these shortcomings – the a-index (average number of citations for articles selected in the H number calculation) and the g-index (number g of articles for which the sum of the number of citations is at least g² (a g-index of 10 indicated that the author has written 10 papers, for which the sum of the citations is at least 100), but without a serious study into their reliability it is not yet advisable to trust them.

Other indicators like the crown indicator or the Top 5% have been put forward in an attempt to correct the bias of the main indicators used, particularly with a view to taking account of the individual characteristics of specific fields. Nevertheless they are calculated using the same citation bases and therefore come up against the same source imprecision. An additional problem is linked to the definition of scientific fields; more specifically, if the indicator’s reliability is to be improved, it is essential to determine the degree of specialisation to be adopted.

A systematic study into four internationally renowned INRIA researchers shows that the bias and shortcomings observed in the indicators are not exceptions but, rather, the rule - at least in terms of computer science in its broadest sense.

5. Has there been any collateral damage?

Intensive, or even exclusive, use of indicators could push researcher players, both scientists and editors, to try and increase their score in the indicators. Increasingly, journals require their authors to include citations of articles from their journal as bibliographic references. Researchers publish their work section-by-section with a view to increasing their number of publications and can adopt a strategy of group self-citation to significantly increase the H index of members in the group.

In addition to the fact that it is possible to significantly “defraud” the values used for indicators in this way, the ever-increasing use of these indicators in the assessment of researchers has damaging consequences for science and innovation. Given the bias from which their calculation suffers, an exaggerated consideration of indicators may push young researchers into obtaining quick results, to the detriment of more long-term research and thereby slowing down innovation by penalising the formation of small communities in emerging fields.

6. Which precautions should be taken?

The Evaluation Committee concludes its report with recommendations on how to use these indicators. Its key recommendation is to resist the temptation of automating evaluation processes. Examples of other recommendations include using several indicators, ensuring lists of journals are reviewed by experts capable of identifying important journals and conferences and only using indicators as a supplement to other assessments to detect trends. These general recommendations are accompanied by proposals related to INRIA’s own specificities, particularly the large variety of its different names and addresses, which requires a standardization process to avoid artificially reducing the value of indicators concerning the Institute. With this in mind, the Commission would like to reiterate that the OST has launched the normadresses programme with a view to filing proposals to the ISI for a coherent list of French laboratories and to better manage multiple affiliations and co-signatures.

 

More information:

analysis document of the Evaluation

 

 

Consult the analysis document of the Evaluation
Committee on the bibliometric indicators (PDF).

 

 

bas de page
Back to topHome page
© INRIA - Updated 02/25/2008 - webmaster@inria.fr