There was a real buzz for our 8th Data Science Show and Tell. It was more crowded than ever - many people came for the first time. There were lots of great questions with a networking session after. Look out for future events and why not join us next time!
The British Library (BL) invites the public to “explore the world’s knowledge”. As Neil Wilson (Head of their Collection Metadata) explained, increasingly that means providing linked open data. The BL is a copyright library, so they get a copy of all UK publications. They have 150 million items: 14 million books, plus journals, papers, audio and even stamps!
The BL has a legal duty to maintain a list of all UK publications (called the British National Bibliography) - and that’s where the linked data story begins. They list all publications (and re-publications) since 1950. The BL used to license bibliographic data to make money, but now it’s linked open data (with a Creative Commons 0 licence).
Linked data means that users (e.g. other libraries and app developers) can be clearer about basic questions (e.g. “who is the author?”). 123 people in the US have the same name as me, and there are 29 places called London. So, to say that David Wilks in London wrote a blog could be ambiguous! Linked data uniquely identifies authors (by International Standard Name Identifier), places (with GeoNames), subjects (with Dewey.info) and organisations (with the Virtual International Authority File and other open standards).
A publication is no longer primarily a thing (e.g. a physical book or newspaper in the library) but can be an event (e.g. an eBook update). Many ePublishers give the British Library regular data feeds, which the BL enrich and make available to a wider group of users (using a SPARQL endpoint linked data store) via bnb.data.bl.uk.
… needs shared standards...
The British Library may have whetted your appetite to write an app about publications, but the other linked data we heard about was just as exciting. Dan Appelquist from the GDS Registers team spoke about open standards. Dan co-chairs the World Wide Web Consortium Technical Architects Group with Sir Tim Berners-Lee. He also works with the Open Data Institute start up programme.
Open standards make computer systems from different organisations work together. The most famous example is the Internet, which wouldn’t work without some consistency in how browsers and websites use the main language (HTML). This also applies to government. A good example is the open documents standard. If we had no standard way to share documents, we might not be able to edit each other’s work. But if we all bought the same word processor, then we would end up locked into an expensive supplier. So, we specify the standard but not the supplier - giving consistency and value for money.
Dan co-ordinates open standards in an open way. Discussion about possible new open standards is public (see an example here). People ask and answer questions about any new standard and then reach a consensus (like on Wikipedia discussion pages).
Registers are “authoritative lists you can trust”. The government has lots, but they aren’t always consistent, and that’s what GDS is trying to improve. As Dan explained, the first GDS register is a list of countries produced with the Foreign Office. This may sound easy but government departments actually hold many inconsistent country lists, and in some cases whether a particular territory is a country can be very controversial (I won’t provoke a diplomatic incident by discussing examples on an official blog!!).
But how do you make a register reliable over time? Many people apply to come to the UK who were born in countries (e.g. Czechoslovakia, USSR) which no longer exist. A register needs a reliable amendment history. To do this, the team are using merkle trees (the same technology as Bitcoin) so that the register’s history can be transparently verified.
… and users!
Zeid Hadi (Delivery Manager for data.parliament) shared the great value that linked data can add to help citizens engage with Parliament. The record (called Hansard) of what happens in Parliament was traditionally published in an unfriendly format (unpleasant to read for both humans and computers!). Linked data is very powerful in the parliamentary record. It allows you to link the MP making a speech to what expenses they claimed, how often they attend Parliament, their interests and how they have voted. The information is under an open Parliament licence: there is an API platform and many datasets can be freely downloaded.
Finally, Steve Redmond from the Ministry of Defence (MoD) presented a text mining tool for learning lessons from written reports of past military engagements. The text in the tool links to the UK Defence Taxonomy so that terms are used consistently according to the principles of linked data and “canonical” (i.e. standardised) lists. Much of the detail was necessarily confidential, but MoD love abbreviations and have a huge number of terms for different pieces of equipment and operations all over the world, so data linking helps even experienced staff.