Archive | Uncategorized

Two jobs at ScraperWiki to ponder over the bank holiday

It’s busy busy at ScraperWiki at the moment. We’re growing our award winning data hub (where people scrape Twitter). We’ve lots of interesting consultancy, for clients like the Cabinet Office (GDS), United Nations (OCHA), and Autotrader. So we’re hiring two people. 1) A Digital Marketer. This is an unusual opportunity to market a marketing data product! […]

What has Europe ever done for us?

Hmm… 2.8 million euros, a ‘history recorder’, and the opportunity to have a full on working relationship with VU Amsterdam Uni, Lexis Nexis, plus with some brilliant bods in Trento and San Sebastien  (Happy Christmas!) It’s official!  We have become a European FP7 partner with the VU Amsterdam University, Faculty of Arts’ (Prof Piek Vossen), Lexis […]

The Humble CSV!

It must be rocket science!  The CSV (comma separated values) file has been in use for 45 years, from before men walked on the moon, and it still remains the cheapest and most reliable way to move data from one computer system to another. While hardware and software standards have moved forwards with the technology […]

Hacking the National Health Service

In the age of easy to use consumer software – from Facebook to the iPhone – health workers find the software they get at work increasingly frustrating. Talk to some! You’ll find stories of doctors crossing hospitals to reboot computers to get a vital piece of data. Stories of individuals keeping patient records on Excel […]

On-line directory tree webscraping

As you surf around the internet — particularly in the old days — you may have seen web-pages like this: or this: The former image is generated by Apache SVN server, and the latter is the plain directory view generated for UserDir on Apache. In both cases you have a very primitive page that allows […]

Do all “analysts” use Excel?

We were wondering how common spreadsheets are as a platform for data analysis. It’s not something I’ve really thought about in a while; I find it way easier to clean numbers with real programming languages. But we suspected that virtually everyone else used spreadsheets, and specifically Excel Spreadsheet, so we did a couple of things […]

Mapping deaths in the Italian prison system.

This is a guest post by Jacopo Ottaviani, Italian freelance journalist and developer. The story it tells was published in the Italian newspaper Il Fatto Quotidiano. Currently in Italy many prisoners die every month in jail. According to an independent dossier by the Italian non-profit association Ristretti Orizzonti (lit. Narrow Horizons), almost one thousand deaths were registered in the […]

Microfinance Data Scraping

I went to the Datakind‘s New York Datadive last November and met the Microfinance Information Exchange (MIX), a group that ‘delivers data services, analysis, research and business information on the institutions that provide financial services to the world’s poor’. They wanted to see whether web-scraping could save them from manually gathering data. So fellow divers and I showed MIX the utility […]