Tuesday, 8 December 2009

Climate change storm

Whatever one's views on climate change, it has been impossible to avoid the media excitement about the leaked emails from UEA this week. It would appear that a number of people are calling for the offending research data to be made publicly available (there has been some disagreement on the amount of information/data that were already freely available) so that the whole process and scientific findings could be publicly scrutinised and the truth be out.

This story has coincided with a presentation I gave at the e-Science All Hands meeting this week. My colleague, Luis Martinez and I ran one of the Birds of a Feather sessions which was on the topic 'Research data preservation and the role of libraries' (more to come on that in a future post). Data sharing (and collaboration) was one of the main themes of the conference.

There is much talk in various circles about open data: making research data freely available so that it can be 'read' along with publications and the like, and so it can be re-purposed in all sorts of creative ways. This is the sentiment of the quotation 'the coolest thing to do with your data will be thought of by someone else.'

Well I'm all in favour of that. It seems like a good idea. Especially with all the possibilities for combining data in creative ways and using them to make new discoveries. BUT...... it has to be balanced with restricted or even no access to data. There are good reasons why some data should be well protected. This might be for personal privacy of the subjects, patents pending, the opportunities for the data creators to do more work with their own data and so on. And then there's the grey area in between when some data might be available to some people for some of the time.

Where should the line be drawn between making data available and not doing? Who should be the arbiter in cases of disagreement. It is often these sorts of policy and organisational issues that cause major difficulties for everyone involved. And the problem is, that by not addressing knotty topics that need discussion, and ignoring them because they're too difficult, means that the result is often inaction.

As a first step towards doing something sensible for data, it would be a good idea to set up a simple internal data registry(the Aussies are already doing this on a national scale) so at least we know what is out there. It could act as a sort of data audit so that all the different areas of storage, data descriptions, preservation etc could be thought about in more detail. Quite how this is achieved so that a minimal amount of effort is required on everyone's part needs thinking about carefully. Also some sort of balance should be struck even at this level between making details openly available or not. Even a simple list of data 'sets' can cause problems.

Fortunately, Prof Paul Jeffreys is leading a couple of projects (EIDCSR and SUDIMIH) that are looking in to many of the issues as is Dr David Shotton with his ADMIRAL project. These projects are doing some detailed investigations. As far as I'm aware, we don't have any shallow but broad investigations going on at the moment (someone please correct me if I'm wrong here). Meanwhile, we in the libraries are developing a robust storage system that is designed to store, preserve and deliver any data it contains (remit for research data as yet to be defined in liaison with other data stores across the University).

At least creating a registry of data would be one of the first steps towards actually 'doing something' about Oxford's data. It could point to any data sets that are available. But who is going to run it, pay for it and so on............. This blog is supposed to encourage discussion. Maybe someone out there at Oxford has got some ideas.
SallyR