Blog

…in data we trust….

We’re in Washington DC, the nation’s capital and US HQ! The city is bathed in spring sunlight, the blossoms are out and there’s a bit of a buzz about the town. The ScraperWiki truck is getting ready to park at The Washington Post on Friday and Saturday for our 3rd major US Journalism Data Camp (hashtag: #jdcdc)

It’s an election year so we can be forgiven for feeling a little smug, our raison d’etre is to dig up data, so where better to make it happen than at The Washington Post, a newspaper that inspired a generation of investigative journalists, inscribed the word ‘Watergate’ as a formal entry in the Oxford English Dictionary, and made ‘deep throat’ a double entendre!

Health, transport, education, security, they’re all ripe for data liberation. We’ve detected interest in “Super” PACS and lobbying data, so let’s hope we see a major focus on these at the event. One of AP’s senior investigative reporters Jack Gillum, (@jackgillum) is keen to drill into Independent Expenditure aka Election Advertisements, and Campaign Finance Disclosure data.  Our own Julian Todd (@goatchurch) has commenced work on liberating lobbying data in New York.

The guys here at The Washington Post have a wish list for liberation and it’s by no means exhaustive:

We’re thrilled by the fact that we signed up so many data scientists and media professionals. The coders will be freeing and/or learning to scrape data and everyone else will be facilitated into teams to hypothesize, gather, analyze, create and present stories and applications based on data. The outcomes will be presented on Saturday at 04:00p and we have a bunch of prizes to give away for the most inspired ideas. We also have some special ScraperWiki prizes for technical contributions.

What’s happening on Friday 30th?

08:30a We will open registration and serve tea coffee and biscuits

09:30a Kick-off and a short plenary. We’ll hear from Vernon Loeb (@VernonLoeb) about what it’s like to work as a data digger at the Post and Chuck Lewis (@crelewis) from AU will talk about partnership with the capital’s flagship publication. Our Own Francis Irving (@frabcus) will say hello and talk ‘data’.  Julian Todd (@goatchurch) and Thomas Levine (@thomaslevine) will explain why scraping is an important technique for getting data and show some examples.  Tom Lee ((@tjl) from Sunlight Foundation will make a callout for help with their GASP project and put some context around votesmart closing their doors.

10:15a The Data Derby and Data Liberators will meet and pour over data ideas. We will review the lifecycle of a data driven story and familiarise people with the ScraperWiki Data Derby route map. We will set out some ideas and facilitate people into teams, with each picking a magnet as their map route icon. The coders who have signed up for the morning ‘Learn to Scrape‘ with python class will be directed to the tutorial room for the three hour session. Anyone signed up for the afternoon tutorial will join the data derby/liberators for some fun.

Data Derby Route Map

12:45a Lightening Talk: Greg Franczyk from The Washington Post will talk about the evolving role of data in the media industry and data’s evolution in media: specifically, how it is gathered and stored, its changing relationship with news, and how it’s presented to consumers.
Callout – Mjumbe Poe, (@mjumbewu) Code for America Fellow would like to share the story of scraping council data for Councilmatic and he would like to get people interested in tackling the agendas.

01:00p Light lunch

01:30p Projects continue…

02:15p ‘Learn to Scrape” Python afternoon tutorial commences three hour tutorial

05:30p Reception (Beer and Pizza).

******************Special NOTE********

Learn to Scrape
The two three hour tutorials Friday morning and afternoon will be run by our chief data scientist Julian Todd (@goatchurch)and Thomas Levine (@thomaslevine) data advocate aided and abetted by Michelle Koeth (@michellekoeth) Code for America Fellow.  They will cover things like identifying good targets for webscraping and navigating the complexity of different types of web pages.  Attendees will create their own scrapers .  The objective will be to get the data into a structured format, and join it with data from another source.  If time allows we will also try to encourage people to do further analysis.

*************************

What’s happening on Saturday 31st March?

09:30a Welcome plus tea coffee and biscuits

09:45 Throughout the morning we will follow the Data Derby route map – please study the picture above.

12:45a  Lightening talk – Jack Gillum, (@jackgillum) AP Investigative Journalist and Michelle Minkoff (@michelleminkoff) Interactive Producer will take about how “Super” PACs and big money have dominated this election cycle and tell us that there is little to fear as there is a mountain of data available on who’s backing presidential candidates and which can help journalists make sense of the big-time fundraisers this year. They’ll also talk about Federal Election Commission filings and show how they can be parsed for good storytelling”.
Callout
: Jan Scaffer (@janjlab) from J-Lab wants to invite ideas from our participants on how to better organize and collect their data, which includes one of the largest databases of U.S. community news sites and a significant database of grant-funded media projects.

0100p Light lunch

0200p Project teams will finalize the details of their data stories in preparation for the presentation.

03:00p Heading towards the finishing line…

04:00p Presentations and Prizes.

The American University School of Communication has been amazingly supportive, Sharon Metcalf (Director Of Partnerships and Programs), is an absolute gem as are her colleagues Lynne Perri (@Lynneperri), Professor of Journalism and Chuck Lewis (@crelewis)Prof of Journalism and Executive Editor – Investigative Reporting Workshop) who were instrumental in getting the event off the ground. We have also been overwhelmed by the support from Vernon Loeb (@VernonLoeb) the Local Editor at The Washington Post who together with Greg Franczyk have set us up in their swish conference center.  A huge ‘thank you’ to Jane Lockhart and her operations team for helping us with logistics. And last but by no means least a big round of applause to Associated Press for helping to fund our refreshments, Sunlight Foundation for our beer and pizza and to J-Lab for sponsoring the prizes – Hip Hip Horray!

Eugene Meyer (Foyer – The Washington Post)