One of the classes I’m taking this semester seems likely to push me outside my “comfort zone” of book history studies and “traditional” library skills like cataloguing. It’s called “Digital Curation” and it focuses on the active and on-going management of digital artifacts through their lifecycle, particularly by maintaining and adding value to a trusted body of digital information for current and future use. I initially signed up for it because I thought it would complement my class on Digitisation, which is a practical course teaching both technical digitisation skills and digitisation project management (including grant-writing, something I am really looking forward to!), but after working through the readings and assignments on “big data” management this week, I’m also excited about the class because of the emphasis on data collection, analysis and visualisation.
As part of the class, we have to write blog posts each week on a topic related to this week’s readings, and this week I got to talk about one of my favourite websites, the Guardian’s Datablog, so I thought I would repost the text of it here:
In August 2011, London experienced some of the worst rioting it had seen in a generation. Following the shooting of a young man in north London, the rioting spread across the city and to other areas of England over the course of four days. To learn more about the events and timeline of the riots, the BBC maintains this excellent website with archived BBC coverage of the riots.
In the wake of the riots, the Guardian newspaper’s online editors undertook a massive collection of data generated by the rioters and victims of the rioters’ activities, and began to analyse it in cooperation with the London School of Economics. They began collecting data by sending teams of online journalists to the London courts to document the public information related to each rioter that was arrested and charged with an offense: name, occupation, home address, crime committed. They also began collecting data publicly available through social media sites such as Twitter and Facebook and publishing their findings as part of their (then) new Guardian Datablog and, eventually, as its own collection of articles and analysis: Reading the Riots.
By doing deep analysis of the data they collected, the researchers at the London School of Economics and editors and journalists at the Guardian were able to eventually begin drawing broad conclusions on the social and economic factors that contributed to the widespread and violent nature of the London riots, but the development of Reading the Riots also contributed to the Guardian’s realization that large data sets can make good journalism and led to the establishing of the Guardian’s Data Journalism site and creation of the Guardian Datastore, a webscale repository of journalistic data collected or curated by the Guardian. Much like using Twitter to track the cholera epidemic in Haiti, data journalism tracks trends in data generated by social media to create analysis and visualisations related to major world events, such as the Olympics or public elections.
But is the Guardian’s capitalization on data collection and analysis as forward-thinking as all that? It certainly sets them apart from other traditional newspaper companies in Britain, such as the Times, but does it presage a sea change in journalism as we know it? Is data analysis “real” journalism, and do the articles produced by the Guardian’s Datablog deserve to be on the front page of the website?