When Tim Berners Lee first came up with the concept of the World Wide Web in the form that we know it today, he was imagining a facility where one web item can be linked to others. This is achieved using hyperlinks – so you can say, click on a linked section of text or other content and are immediately taken to another location or page. Like this:
In fact, Sir Tim's concept of what the Web is and how things should link to each other goes much further than our commonly accepted perception of what the Web is, and takes linking to a higher dimension. He has recently written about this in a paper “Putting government data online” which, although concerned with official data, describes nicely what we are doing here at Oxford for both ORA and for the BRII (Building the Research Information Infrastructure) project. It is a topic that is highly relevant for creators of scholarly outputs.
TBL's underlying principal is that of Linked Data. Linked data is described on the Linked Data.org website as follows: “The Web enables us to link related documents. Similarly it enables us to link related data. The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web.” The data we’re talking about could be data produced as an output of research or the data describing that research (as we are doing for the BRII project enabling linking between people, projects, publications, funding agencies, departments, research groups and so on). It can be anything - a geographical location,statistics, a concept such as a project and so on.
The principals of linked data are that:
1. it can be accessed by others who have an interest in that data (although only the data you want to make available is made accessible)
2. “Linked Data can be combined (mashed-up) with any other piece of Linked Data…No advance planning is required to integrate these data sources as long as they both use Linked Data standards.” (Berners Lee, Putting government data online, 2009). It is all about joining data together that wasn’t previously connected, without having to go through the pain of trying to make separate systems interoperate. TBL cites the following example: “if the Department of Transport publishes road accident data, a cycling site selects the cycle accident subset, and can publish it as a map adding cycle routes and hills, and cycle shops. An agency publishes data about the amount of money given to different towns, another maps it against the per capital income levels in those towns. And so on in uncountable permutation.”
3. “Scalable: It's easy to add more Linked Data to what's already there, even when the terms and definitions that are used change over time.” (Berners Lee, Putting government data online, 2009).
The clever thing is, is that the data is understandable by machines, which is why you can do interesting things with it by joining it up with other data. The data can be described using terms from an growing set of vocabularies which are being increasingly used (see what we're using in our work here - work in progress so a growing collection). One major example is Library of Congress Subject Headings which have recently been made available in this way. More vocabularies and Linked Data versions of existing specialist subject vocabularies are being developed.
The concept of producing your data as Linked Data is something that is likely to become much more common as the semantic web gains in popularity and, over time, becomes the norm. It is already being used by the BBC and by the Guardian newspaper on its Guardian Open Platform. One part of this the Guardian Data Store, “a collection of important and high quality data sets curated by Guardian journalists. You can find useful data here, download it, and integrate it with other internet applications.” (Guardian, 10 March 2009)
Are many Oxford researchers producing their data as Linked Data yet?
SallyR

0 comments:
Post a Comment