Author Archives: Julian Todd

About Julian Todd

Co-creator of scraperwiki.com, publicwhip.org.uk, and electionleaflets.org. Also writes software for running machine tools and drawing cave maps.

My time at the Autocloud

The global CADCAM behemoth known as Autodesk hoovers up another small company every two weeks — a process unlikely to diminish following a $750million bond issue last month. (Well, what else are they going to do with that money?) It … Continue reading

Posted in business, Scrapers | 2 Comments

On-line directory tree webscraping

As you surf around the internet — particularly in the old days — you may have seen web-pages like this: or this: The former image is generated by Apache SVN server, and the latter is the plain directory view generated … Continue reading

Posted in Scrapers | 1 Comment

Three hundred thousand tonnes of gold

On 2 July 2012, the US Government debt to the penny was quoted at $15,888,741,858,820.66. So I wrote this scraper to read the daily US government debt for every day back to 1996. Unfortunately such a large number overflows the … Continue reading

Posted in developer | Tagged , , , , , , , | 3 Comments

PDF table extraction of pagenated table

The Isle of Man aircraft registry (in PDF form) has long been a target of mine waiting for the appropriate PDF parsing technology. The scraper is here. Setting aside the GetPDF() function, which deals with copying out each new pdf … Continue reading

Posted in developer | 3 Comments

5 yr old goes ‘potty’ at Devon and Somerset Fire Service (Emergencies and Data Driven Stories)

It’s 9:54am in Torquay on a Wednesday morning: One appliance from Torquays fire station was mobilised to reports of a child with a potty seat stuck on its head. On arrival an undistressed two year old female was discovered with a toilet … Continue reading

Posted in opendata, Scrapers | Tagged , , , , , , | Leave a comment

Fine set of graphs at the Office of National Statistics

It’s difficult to keep up. I’ve just noticed a set of interesting interactive graphs over at the Office of National Statistics (UK). If the world is about people, then the most fundamental dataset of all must be: Where are the … Continue reading

Posted in opendata | Tagged , , , , | Leave a comment

The Data Hob

Keeping with the baking metaphor, a hob is a projection or shelf at the back or side of a fireplace used for keeping food warm. The central part of a wheel into which the spokes are inserted looks kind of … Continue reading

Posted in developer | 1 Comment

The UN peacekeeping mission contributions mostly baked

Many of the most promising webscraping projects are abandoned when they are half done. The author often doesn’t know it. “What do you want? I’ve fully scraped the data,” they say. But it’s not good enough. You have to show … Continue reading

Posted in developer, journalism | 3 Comments

Big fat aspx pages for thin data

My work is more with the practice of webscraping, and less in the high-faluting business plans and product-market-fit leaning agility. At the end of the day, someone must have done some actual webscraping — and the harder it is the … Continue reading

Posted in developer | 2 Comments

Journalism Data Camp NY potential data sets

Here is a review of some of the datasets that have been submitted for the Columbia Journalism Data Camp this Friday. This list is only for backup in case not enough ideas show up with people on the day (never … Continue reading

Posted in developer, events, journalism | 2 Comments