The video gives a flavour of what happened when journalist, bloggers, software developers and artists came together to work on interesting and novel ways of exploring and using public data. You can read a roundup of the day at this link.
Video produced by The Hatch on behalf of Open Labs:
We’ve always intended that Scraperwiki itself should be open source – philosophically it doesn’t make sense for a collaborative code wiki for working with public datasets not to be.
More practically, letting anyone see our developer tools means you can also access our bug tracker, so you know what we’re working on and what issues and features we’re thinking about.
We’re proud to announce that Scraperwiki is now open source under the Affero GPL.
So have at it – clone the Mercurial checkout into Gitorious, get the core Django application running and let us know which bugs and features matter, or don’t matter, to you.
(Photo credit: “Source de la cascade de pista” from Flickr by JG65)
On Friday, ScraperWiki developer Anna Powell whizzed participants round ScraperWiki in 86 slides, explaining data scraping and storing and how it could be used in media investigations. View her presentation at this link. [For more information about ScraperWiki for journalists, see this post]
Online Journalism Blog and Help Me Investigate founder, Paul Bradshaw added his thoughts on where data sits in journalism. He flagged up the new government’s plans for ‘Big Society’: while the concept needs data, it might lead to less open information and more private finance initiatives (PFIs), for example.
Fired up by the intros, participants quickly brainstormed on the flip charts and groups formed speedily. Before long it was heads down around laptops in the break out rooms. We had plenty of pens to give out..
Aptly, given that one of our sponsors was NHS Local, three groups looked at health.
‘Smear’ Campaign: Anna Powell and Nicola Hughes (pictured, below) tackled data around cervical cancer screening. They examined statistics comparing West Midlands Hospital Performance. They looked for false negatives to identify how performances in different hospital compared.
(Image: Mike Cummins)
The Health Map UK: Paul Bradshaw, Mark Bentley, Martin Moore, Clare White, Carl Plant and Andrew Mackenzie decided to put GPs on the map (some of the team pictured, right)
The result: a map of all 8,000 GP surgeries around the UK – the first of its kind, they said. They then began layering this with additional data around health indicators. They looked at which areas saw the highest number and lowest number of appointments.
Birmingham At Your Leisure: Philip John, Daniel Bentley, Edward Saxton, Andrew Brightwell and James Shuttleworth set about scraping details of leisure centres, mapping, adding population density and health information (pictured below). Andrew has blogged some more detail at this link.
Other groups looked at education and politics.
Follow the Money: Ben Griffiths (below, left) scraped data to build a database of donations to political parties, using Electoral Commission information. He then designed an online search facility. The eventual result would be a web page for each donor, showing their donation history.
Building Schools for the Future Map: The final project, by Anna Blackaby, Alex Tucker, Andy Mabbett, Stuart Harrison, Michael Grimes and Amy Mcleod, looked at Building Schools for the Future data – focusing on the West Midlands, where £41,190,089 had been wasted, they said. Using a variety of data sources, they mapped where and which schools had lost funding. A graphic showing breakdown in the West Midlands Region, showed that Conservative-controlled schools fared better than Labour’s, under the changes.
The winners
Our judges, Charlotte Crossley, director, Core Marketing; Annette King, innovation manager, Digital Birmingham; and Simon Jenner, head of incubation at E4F, decided to award one first prize; and two runner-ups.
First went to Ben Griffiths for his ‘Follow the Money’ political donations project – the judges particularly liked the way he presented his database through searchable web pages, with commenting facilities. You can find an early version of his database here: http://bit.ly/9GzbGy.
Joint second went to the Smear Campaign – another small team, of two. But a bigger group was rewarded as well: the Building Schools for the Future Map – and it was the final graphic, showing the difference between Labour and Tory controlled schools (pictured above), that clinched it for them.
Finally, the really exciting prize – the ScraperWiki builder’s mug – went to James Shuttleworth for the best coded scraper.
(Image: Mike Cummins)
Congratulations to all! The final presentations were live streamed via UStream. Video at this link…
Following up on the ideas
Paul Bradshaw tweeted a good idea, asking participants who use the social bookmarking site Delicious: “People at #hhhbrum – can you tag anything you use today at Delicious w/ #hhhbrum + #data if it’s a dataset?”
We’re trying to collect together as much material as possible from these Hack Days, so this will be something we encourage for our future events around the UK.
Participants, and anyone who has followed up on their activities, please keep in touch with your progress [via judith at scraperwiki.com] and add any relevant links in the comments.
Just got home from #hhhbrum. Thanks to all @Scraperwiki for putting on such a great event. Learnt a lot and inspired to do more with data, @djbentley, on Twitter.
Damn fab day at #hhhbrum; lots of great folks and enthusiasm for using data. All day battle linking schools by name to education.data.gov.uk, @ajtucker on Twitter.
Great day at hacks and hackers #hhhbrum. Lots of interesting discussions with lots of interesting people. Thanks to the @ScraperWiki peeps, @Digihode on Twitter.
ScraperWiki is on tour! We currently have plans for more events in Glasgow, Cardiff, Manchester, Leeds, Belfast, Dublin and London. If you would like to be involved or are interested in sponsoring an event, please get in touch via judith at scraperwiki.com.
We had a fantastic turnout, with a mix of programmers and journalists from a variety of backgrounds. We stole a good number from the Liverpool Post & Echo newsroom, who came armed with brilliant ideas for local data mashing.
Teams – both large and small – formed quickly, according to specialism and interests. Then, it was down to the hacking…
We had crime…
Alison Gow, Frank Swain, Sam Sutton, Luke Traynor, Maria Breslin worked on the Life and Alleged Crimes of Pancake Taylor. This visualisation project took the story of one local man’s brush with the law. Using maps and timelines, the eventual result was a web page dedicated to this notorious Liverpool gangster’s (alleged) activities.
Crime prevention…
Julian Todd, Jo Kelly and Joni Alexander took data from the Merseyside Police website, in order to show when a policeman or woman is removed from the listing of officers covering an area, or added. This project could be rolled out in any local area, using similar data. Read more on Ed’s blog here.
Court case alerts…
Adrian McEwen, Donovan Hide, John O’Shea and Andy Freeney worked on ‘The Gavel’ featuring Judge Duino (Do-eee-no), with the aim of making legal process data tangible.
They started to think about new and interesting ways that this data might be interpreted publicly and built an electronically controlled ‘gavel’ which could be triggered in response to different aspects of the data.
John O’Shea said: “I think that this project might be thought of as a very early prototype for a truly public and transparent interface with ‘law’.”
David Bartlett, Mike Nolan, Neil Morrin, Ben Turner, Dan Kay, Martin Dunschen, Tori Hywel-Davies, Paul Freeman, Dan Owen and Kevin Matthews scraped local data sets to do with health, education and transport for a series of Merseyside maps. The project was to create a map packed with local information eg. schools, GP surgeries, train stations, etc. They managed to scrape information from Liverpool PCT for GPs, the National Rail website for stations, and the department of education’s site for schools.
They found that using Google Earth was the only way to get it all on one map. For the project to really work and become useful with more information added, a new map interface would be needed to allow users to select what information they wanted displayed, says team member David Bartlett. The team’s presentation can be viewed here.
Business…
The ‘Business Light’ by Mark Thomas, Francis Irving, Aidan McGuire, Ben Schofield, Alistair Houghton, Laurence Rowe and Tom Mortimer-Jones was a dashboard for watching business activity in Merseyside – allowing users to make informed business decisions through a traffic light ranking system. They protoyped it, checked what data they could get (employment levels, insolvancies, contracts etc.), and worked out what the website would do. It also involved visualisations and screen scraping.
Libraries…
In ‘Library Data: What’s the Story?’ (originally: ‘why aren’t libraries more like Amazon?’) Ben Webb, Anna Powell-Smith and Mandy Phillips followed up a story on closed data in libraries. UK libraries generally have proprietary catalogue systems without public APIs. As a result, libraries have to pay for access to their own data, and users can’t share records easily. They found some sample open RDF data from one library provider, and built a prototype for an open UK-wide catalogue search. Find the presentation at this link.
Sport…
Jamie Bowman, Francine Higham, John McKerrell, Neil Macdonald and Francis Fish tackled the Other World Cup.
This was the World Cup’s alternative story. A visualisation showed stats that the media weren’t focusing on: the number of people displaced; and the chance of England winning, for example.
Meanwhile, Adrian McEwen’s lovely Bubblino machine tweeted bubbles everytime the hashtag #hhhliv was uttered on Twitter.
The winners of the day, as judged by Jane Clare, executive editor of Trinity Mirror’s Merseyside Weeklies, lawyer Steve Kuncewicz, and Lindsay Sharples, director of LJMU Open Labs:
First: The Business Light
Second: Why aren’t libraries more like Amazon?
We’d like to say a big thank you to our sponsors for hosting, feeding and rewarding our hard working participants; and congratulations to all involved in the day. Thank you to all the hacks and hackers who supplied information for this blog post.
“I’m still fascinated by #scraperwiki and #hhhliv. I should investigate more,” @defnetmedia on Twitter.
“Great day at #hhhliv trying to visually represent costs of #Worldcup. Trying to to take this further as lots more info emerges in future months,” @fransa on Twitter.
“Good day #hhhliv. Learned a lot from some very smart people,” @ed_walker86 on Twitter.
“What impressed me most about the event was the total commitment of all of those present to be involved in the process and deliver a fresh idea,” John O’Shea, artist.
*Locations may be added or removed, depending on interest. If you would like to talk to us about getting involved in these events, as a partner or sponsor, please contact judith [at] scraperwiki.com.
For non-programmers, a first look at ScraperWiki’s code could be a bit scary, but we want journalists and researchers to make use of the site, so we’ve set up a variety of initiatives to do that.
Firstly, we’re setting up a number of Hacks and Hacker Days around the UK, with Liverpool as our first stop outside London. You can follow this blog or visit our eventbrite page to find out more details.
Secondly, our programmers are teaching ScraperWiki workshops and classes around the UK.
Anna Powell-Smith took ScraperWiki to the Midlands, and taught Paul Bradshaw’s MA students at Birmingham City University the basics. Paul has written up some notes at this link.
Julian Todd ran a ‘Scraping 101′ session at the Centre for Investigative Journalism summer school last weekend. He ran through the basics of ScraperWiki and showed how he was using it to map and track offshore oil wells in the UK.
The presentations were concluded by Francis Irving, developer for ScraperWiki, who outlined how they can help journalists transform confusing data into a newsworthy story. He showed two examples of datasets the company can ‘scrape’ data from, producing more accessible tables or even visualisations such as maps, saving journalists’ time.
(Some more general points from the session can be read here)
Armed with their laptops and WIFI, journalists and developers will be put into teams of around four to develop their ideas, with the aim of finishing final projects that can be published and shared publicly. Each team will then present their project to the whole group.
As previously announced, we will be running an event in Liverpool on July 16; more on that here.
We’re happy to announce our next Hacks Meet Hackers event, to take place in Liverpool on Friday July 16, 2010 from 9.30am to 8pm at the Arts and Design Academy.
Can’t get to Liverpool? Don’t worry – we’ve got more UK hack days in the pipeline: get in touch to find out more about attending or sponsoring one.
So what’s this hack day all about? It’s a practical event at which web developers and designers will pair up with journalists and bloggers to produce a number of projects and stories based on public data.
Who’s it for? We hope to attract hacks and hackers from all different types of backgrounds: people from big media organisations, as well as individual online publishers and freelancers.
What will you get out of it? The aim is to show journalists how to use programming and design techniques to create online news stories and features; and vice versa, to show programmers how to find, develop, and polish stories and features.
How much? NOTHING! It’s free, thanks to our sponsors.
What should participants bring? We would encourage people to come along with ideas for local ‘datasets’ that are of interest. In addition we will create a list of suggested data sets at the introduction on the morning of the event but flexibility is key for this event.
But what exactly will happen on the day itself? Armed with their laptops and WIFI, journalists and developers will be put into teams of around four to develop ideas, with the aim of finishing final projects that can be published and shared publicly. Each team will then present their project to the whole group. Overall winners will receive a prize at the end of the day. Food and drink will be provided during the day!
Last week saw big steps forward in public data: on Monday, Prime Minister David Cameron wrote to all government departments, setting out a timetable for the release of a swathe of official datasets.
A big step forward – but a new dataset over at ScraperWiki reveals there’s still a very long way to go. Developer Anna Powell-Smith has built a scraper for the Information Asset Register (IAR).
The IAR is a register of unpublished datasets held by government departments – and it has more than 2,100 entries. The database shows which department holds the information, and should include a short description of what’s in there.
The data shows how far is still to go for open information: for one, David Cameron’s release last week covers fewer than ten datasets – important ones, beyond a doubt, but only a scratch in the surface.
But this is just a small part of the problem, as anyone looking at the full data in Powell-Smith’s scrape can see: even in this register of government data, quality is low.
More than half of the records in the IAR are missing details – often details as basic as a description of the record’s contents. Some departments have submitted hundreds of datasets, while others appear to have merely carried out a cursory search and listed a handful. Some didn’t even bother to do that.
A first step for the government’s new Transparency Board should doubtless be to update the register and bring it up to scratch.
Cameron warned that the data would initially be patchy. Given the poor state of even this simple document, it seems he wasn’t kidding. The culture of government might be changing, but developers and journalists alike will need to keep on the pressure, if data good enough to be of use to anyone is going to come out.
We have been a little silent about the fact that ScraperWiki was presented with the Guardian Media – Innovation in Technology Award at the industry’s most prestigious awards bash called MEGAS a few weeks ago. We were not expecting the award as the judges had seen ScraperWiki in its alpha state which was quite a while before Christmas last year – so it was a lovely surprise when our name was called out.The site has come a long way since then but needless to say we have much more to do.The event was very swish with copious amounts of alcohol flowing in the comfortable surroundings of the Paramount Club at the top of the CentrePoint building in the West End of London – although not a good place for vertigo types drinking strong cocktails!
…and the award was presented by the very energetic and enthusiastic Guardian tech journalist Aleks Krotoski – photograph compliments of Sym Roe.
Another 4iP supported company Ideonicwas also presented with an award on the night in the category of Applications and Gadgets. Their entry was for MirrorMe an innovative application that aims to promote healthy lifestyles in the young adult population. Users take any photo of themselves, enter an overview of their bad habits and are then presented with an image of what their face would look like in the future. Great idea!
We have also just been told that we are short listed for the Regional Business Award 2010 – Liverpool John Moore University Knowledge Business of the Year which is sponsored by the LiverPool Echo and separately for the Digital Business Big Chip Award for Technology which will be announced in Manchester in June 2010. Its all good.