Archive | Uncategorized

What has Europe ever done for us?

Hmm… 2.8 million euros, a ‘history recorder’, and the opportunity to have a full on working relationship with VU Amsterdam Uni, Lexis Nexis, plus with some brilliant bods in Trento and San Sebastien  (Happy Christmas!) It’s official!  We have become a European FP7 partner with the VU Amsterdam University, Faculty of Arts’ (Prof Piek Vossen), Lexis […]

The Humble CSV!

It must be rocket science!  The CSV (comma separated values) file has been in use for 45 years, from before men walked on the moon, and it still remains the cheapest and most reliable way to move data from one computer system to another. While hardware and software standards have moved forwards with the technology […]

Hacking the National Health Service

In the age of easy to use consumer software – from Facebook to the iPhone – health workers find the software they get at work increasingly frustrating. Talk to some! You’ll find stories of doctors crossing hospitals to reboot computers to get a vital piece of data. Stories of individuals keeping patient records on Excel […]

On-line directory tree webscraping

As you surf around the internet — particularly in the old days — you may have seen web-pages like this: or this: The former image is generated by Apache SVN server, and the latter is the plain directory view generated for UserDir on Apache. In both cases you have a very primitive page that allows […]

Do all “analysts” use Excel?

We were wondering how common spreadsheets are as a platform for data analysis. It’s not something I’ve really thought about in a while; I find it way easier to clean numbers with real programming languages. But we suspected that virtually everyone else used spreadsheets, and specifically Excel Spreadsheet, so we did a couple of things […]

Mapping deaths in the Italian prison system.

This is a guest post by Jacopo Ottaviani, Italian freelance journalist and developer. The story it tells was published in the Italian newspaper Il Fatto Quotidiano. Currently in Italy many prisoners die every month in jail. According to an independent dossier by the Italian non-profit association Ristretti Orizzonti (lit. Narrow Horizons), almost one thousand deaths were registered in the […]

Microfinance Data Scraping

I went to the Datakind‘s New York Datadive last November and met the Microfinance Information Exchange (MIX), a group that ‘delivers data services, analysis, research and business information on the institutions that provide financial services to the world’s poor’. They wanted to see whether web-scraping could save them from manually gathering data. So fellow divers and I showed MIX the utility […]

5 yr old goes ‘potty’ at Devon and Somerset Fire Service (Emergencies and Data Driven Stories)

It’s 9:54am in Torquay on a Wednesday morning: One appliance from Torquays fire station was mobilised to reports of a child with a potty seat stuck on its head. On arrival an undistressed two year old female was discovered with a toilet seat stuck on her head. Crews used vaseline and the finger kit to remove the […]

Handling exceptions in scrapers

When requesting and parsing data from a source with unknown properties and random behavior (in other words, scraping), I expect all kinds of bizarrities to occur. Managing exceptions is particularly helpful in such cases. Here is some ways that an exception might be raised. [][0] #The list has no zeroth element, so this raises an […]