Freesteel Blog » Framing the Excellent Research

Framing the Excellent Research

Tuesday, October 2nd, 2012 at 11:55 am Written by:

The UK universities are currently tearing themselves apart with the Research Excellence Framework (REF) in a dog-eat-dog competition for positional prestige and percentage share of the fixed-sum cash grant. The process involves submitting the names of selected academics and their four choice papers to panels of professors for the purpose of ranking excellence and impact. The impact of a paper was to be objectively measured by counting the number of other papers that cited the chosen paper.

Now, no wise person would have embarked upon the mission to design such a process without immediate and first line reference to the well-known Goodhart’s Law:

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes [eg distributing money].

The law has two consequences: (1) The outcome will, over a brief time interval, cease to measure what it is intended to measure, and (2) the process of collapse can result in a great deal of collateral damage.

What sort of damage? Distortions of the scientific process. Numerous and unnecessary citations between publications that are often gratuitous. Many minimal scrappy papers rather than fewer complete and well-rounded ones when the numbers count. The submission of papers to inappropriate journals of higher ranking instead of to the specialist publications where they belong.

All of this activity drains people’s precious time and degrades the quality of the output.

And beyond this there is the more corrosive moral damage, the internal institutional jockeying for positions among the researchers who are selected by their superiors to have their names put forward. Those not in the list can be discarded, sidelined, harassed and generally disposed of as though their presence is of no material consequence to the outcome score.

Luckily, the REF2014 website has all the background paperwork, consultations and pilot studies necessary to piece the story together, and see to what extent the professorial class — the smartest guys in the land and on whom we depend upon to chart the course of the human race — considered the basic parameters of the human condition when they designed a process that applied to their institutions. God help us.

Citation counting

The first consultation was published November 2007 proposing to use the count of incoming citations to an academic paper as a proxy measure for its excellence and impact. The total sum of the impact of all of the papers produced by employees of an institution would equal the impact of the institution.

Henk F. Moed of the Centre for Science and Technology Studies at Leiden University (now a Senior Scientific Advisor at the closed publishing outfit Elsevier) was paid to assess the potential citation databases that could be used, and Evidence Ltd., a business spin-out from University of Leeds spin-out outfit now owned by Thomson Reuters, was paid to produce a bunch of sciencey-looking charts that make it appear like they know what they are doing.

If the definition of a capitalist is someone who will sell you the rope to hang them with, then the definition of a modern academic is someone who will sell you a report that justifies their hanging.

But back to the proposal for counting citations of a paper as a measure of its quality:


Yes, that is what it looks like:– the policy paper proposing the principle of counting the citations of other papers as a measure for excellence and quality refers to a “number of studies” and “numerous studies”without a single whiff of a reference.

It therefore scores zero for excellence.

At least we have an account of the immediate application of Goodhart’s Law by some unspecified “experts”:

Sounds obvious, doesn’t it? It is, after all, common practice for all published authors. “If you liked this book, here are six other ones by the same writer.” But even here there was a problem, according to a response to the consultation:

It was time to bring in the experts to paper over the flaws. The REF process would rely on panels of experts informed by impact related measures, such as citation counting.

It was time to do the pilot program.

Bibliographic references

The plan immediately broke down against the creaky reality of the university IT systems.

In contradiction of the importance the REF designers had assumed was the ultimate “output” of a university, they do not in reality keep careful registers of the papers published by their employees. In fact they barely keep a workable record of who they have employed in the past.

Having gathered some sort of record of the papers produced by their staff, these lists were sent to Evidence Ltd. who subcontracted to Symplectic who returned tables of other papers citing those papers to each university for verification.

There is also an interesting report from Evidence Ltd on the other side of the fence which basically said the data coming from the academic institutions was in such bad shape they were forced to take the policy decision on whether (a) to accept whatever comes and attempt clean it centrally, or (b) to send it back and tell them to do it properly knowing that they were already doing the best they could.

An endless round of headaches occurred, including many caused by the most popular and brain damaged software in the world:

Feedback on the practicalities

The Brighton-based “global” outfit called Technopolis produced its second report offering the following three models. (They also offered zero information in a “global” context, like: How is this problem is addressed in other countries, because maybe we can save a lot of time by learning from them?)

Here’s where the choice gets made between the option of surveying an academic institution across its total output, and the option of surveying a small self-selected sample of its output.

Besides the practical challenges of creating citation data on the total output (given the shambolic state of university IT systems) the following distortions were recognized.

Very little downside was seen by anyone for the limited selected papers option — the one that was eventually chosen.

Perhaps if a sociology or anthropology department had ever been given a grant to research the gameplay and divisive office politics that results from this type of exercise, they would have been aware of it. Not likely. The academic community shows absolutely no interest for directing its powers of rational scientific inquiry towards its own institutional operations. No prior research was ever mentioned. No comment was ever raised as to the surprising gap in the knowledge. Why did they need to go through this whole exercise to come to this conclusion when a couple of hours inspection of an average university IT system would have shown up the holes? This should have rung alarm bells. How could it be any different? It should be well-known and well studied the circumstances in which good, effective integrated software is deployed. And these circumstances, as with data about the law, are not present. The programmers who could easily find themselves interested are systematicall frustrated by institutional indifference, and by the tacit privatization and inaccessibility of the raw data. Why do universities give so much petrol-headed support to engineering students to build yet another model car each year, yet provide negative encouragement for what would be a highly productive Software Engineering project within their four walls?

The September 2009 report drew everything together in painstaking detail in Annex B:

* Elsevier provided the Scopus data in a set of GNU PGP encrypted zip files, with each zip file containing the data for 10,000 articles, with each article’s data stored in a separate XML file. As part of the data transfer process Elsevier also provided the XML Schema files that the article XML files used.

* From the XML Schema files a relational database structure was built that followed the structure of the XML Schema and allowed for the XML files to be loaded with minimal processing. Once the database structure was created a C# program was written which decrypted and unzipped the files provided by Elsevier and loaded each XML file to the database. The loading process took 10 working days to load the 11.8 million articles and the final database is 366GB in size.

* In performing the loading, we assigned each record (corresponding to a single output, for example an article in a journal) a unique identifier, to which we refer as the ‘keyid’. The keyid is not a part of the Scopus data, as supplied, but provides a key which we can use when manipulating the data. It serves an equivalent role to the ‘UT’ identifier used in Web of Science.

As a rule, it is never a good sign when you are generating a brand new unique identifier for a dataset where there ought to already be one. But it got worse.

* At each stage of the above procedure, we monitored the number of ‘one to many’ matches that were present, because these were indicative of unduly lenient match keys. The match keys described below are the result of the monitoring of this.

* We initially matched items using their Digital Object Identifier (DOI). In principle, this provides a unique match to a single article. In cases where the DOI existed, this was generally the case, though we found some examples of items in Scopus sharing the same DOI. This may be legitimate (for example, some journals assign the same DOI to all of their ‘letters’ page, rather than assigning a DOI to each letter on it), in other cases it appeared to be due to data error in Scopus or from the pilot HEIs. We have since found a few instances where, although the match on DOI was unique, it has linked the output as supplied to an incorrect output in the database.

* Unfortunately the Symplectic outputs table did not include journal ISSNs. We made use of the Thompson ‘UT’ included with the records supplied by Evidence to add ISSNs to these records, using the raw Web of Science data that we hold. We then used these augmented records to match to Scopus, using the ISSN, volume and first page of the item.

* In the next step, we ‘cleaned’ the journal titles by removing all non-letter characters from them. Scopus captures both the full source title, and an abbreviated source title. We used either of these, the volume and start page as our next match key. We removed non letter characters from the titles to allow cases like ‘Phys. Rev. E.’ and ‘Phys Rev E’ to match. The matching in this step was case insensitive.

* Finally, we matched on publication year, and the first sixty characters of the cleaned item title (i.e. we stripped out everything besides letters). Although this may seem a rather ‘loose’ match key, recall that, at every stage of this procedure we kept only matches that linked to a unique record in Scopus. The matching in this step was case insensitive.

In Annex I there is a note about similar trials in Australia for their Excellence in Research framework made by randomly sampling academic papers, rather than trying to rate them all or allowing the institution’s hierarchy to select them. This is the first sign in three years that our esteemed policy-makers have deigned to consider hard-won knowledge from outside of Great Britain.

A somewhat delayed discriminatory analysis paper published in February 2011 really got lost in the long stats grass:

You can bet there is a lot more going on than what is described by beta25.age.age.malejklm.

Final consultation

Having debugged the gross impracticalities of their initial citation fantasies and established the statistical validity of all their metrics, the second consultation was published in order to fine-tune the process and prepare for action.

On the whole, there were very few quibbles.

This train was definitely gaining momentum.

At about this time, someone finally decided to we should maybe see how other countries were doing it, and commissioned the RAND Corporation for a review of international practice.

This was done on the cheap. From among all the hundreds of case studies from around the world they could have looked at, they chose to report on: (1) the already defunct Research Quality Framework from Australia, (2) a crappy on-line multiple choice product they once sold to the Arthritis Research Council, (3) another defunct program, this time from the US called ExpectMore.gov developed by the Bush Admininstration, (4) an impractically complex system from the Netherlands still in its early days.

And now for the Impact Assessment and sign-off:

Don’t you just love how important it is to print on both sides of the paper?

What’s the alternative

Here’s the thing. Why is there nowhere in the university system a Professor of Science Publication and University Administration who would just know, without wasting 3 years on credulous consultants, that the idea wasn’t going to fly? The entire research establishment is utterly geared around this decades old official dissemination process, which may or may not be seriously defective and has absolutely not been updated to take account of the internet, so you would think it was a legitimate area of study.

For example, in the field of medical research, where the direct published results of the science is capable of killing or curing people tomorrow, and which exists under a system of government granted monopolies and corporate manufacturers who have proven time and time again that they do not care whether you live or die when they can turn a profit, we learn that tracking the existence of unpublished papers is vital.

If that’s what we need to do, and at the moment universities can’t even track the papers by their own staff on which their income depends, maybe we need to look at the problem critically, scientifically, and not contract the job out to some two-bit consultant outfit who doesn’t care if the problem gets solved.

And anyway what is this trying to achieve, beyond the allocation of some block grants on top of research grants based on retrospective performance? Why not put the money out in proportion to total number of full time active researchers, and nothing else? Surely the point of hiring academics is that they are people who want to do worthwhile research, and have been given the opportunity of a career in which they can do so. No one sets out to waste their precious life producing research that cannot possibly be of importance ever if they can avoid it. It’s a matter of luck and the sort of judgement that cannot always be communicated to a journalist in a thirty-second pitch before the crisis has hit. You cannot measure the future. How could we have ever known how much more important to our survival was the discovery of the Ozone Hole than ever finding a cure to cancer?

Here’s another idea:

Only count publications that are open access and which can be linked to from a stable URL and whose bibliographies are proper hyperlinks to other publications, rather than in these out-of-date citation reference formats.

If that was the rule, then academics would change their behaviours a positive way, and we wouldn’t have to pretend that we were measuring something while conveniently ignoring the effects of their manipulation. Academic staff would be free to continue to provide confidential work to industry or sell their output to private corporations for inclusion in their commercial publications, but there is no reason this work should be remunerated through the public finances.

The REF is intended to reward and incentivize the behaviour known as Research Excellence. There is zero evidence — even if it was able to measure this factor without any collateral damage caused by the process of measurement and the effects of of Goodhart’s law — that disbursing money to these institutions in proportion to this factor will have any connection. For example, there’s that new Sport Centre the Vice Chancellor wants to build, which he now can do because five years ago one of the maths professors happened to prove a really good theorem.

And, finally, when proposing a software project to track scientific output from universities and its in-bound links, go round all the universities and find out how they do it, pick the best one, and roll it out under an open source license so we can all make it good. Whatever you do, don’t do whatever it was that resulted in the
unbelievably shite Joint Electronic Submissions (Je-S) System. How they put up with it without going on strike is beyond me.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <blockquote cite=""> <code> <em> <strong>