-
Recent Posts
Our Tweets
- We're hiring a Product Marketing Manager - please pass on to marketing people you know who like data! scraperwiki.com/jobs/#swjob11 4 hours ago
- RT @frabcus: Secret! Hidden in this quick start guide for developers to make data tools... is a Wikipedia image scraper https://t.co/4sgpWS… 5 hours ago
- Make your own data tool with HTML, Javascript, and Python: bit.ly/18bgdYM 7 hours ago
- beta.scraperwiki.com is back up. Engine ticking over nicely. Have a productive Friday everyone :-) 9 hours ago
- In the meantime, make sure to check out our blog – including awesome post by @d4nt on spreadsheets and data: blog.scraperwiki.com 9 hours ago
Find us on Facebook
Archives
Categories
Meta
Author Archives: Julian Todd
My time at the Autocloud
The global CADCAM behemoth known as Autodesk hoovers up another small company every two weeks — a process unlikely to diminish following a $750million bond issue last month. (Well, what else are they going to do with that money?) It … Continue reading
Posted in business, Scrapers
2 Comments
On-line directory tree webscraping
As you surf around the internet — particularly in the old days — you may have seen web-pages like this: or this: The former image is generated by Apache SVN server, and the latter is the plain directory view generated … Continue reading
Three hundred thousand tonnes of gold
On 2 July 2012, the US Government debt to the penny was quoted at $15,888,741,858,820.66. So I wrote this scraper to read the daily US government debt for every day back to 1996. Unfortunately such a large number overflows the … Continue reading
PDF table extraction of pagenated table
The Isle of Man aircraft registry (in PDF form) has long been a target of mine waiting for the appropriate PDF parsing technology. The scraper is here. Setting aside the GetPDF() function, which deals with copying out each new pdf … Continue reading
Posted in developer
3 Comments
5 yr old goes ‘potty’ at Devon and Somerset Fire Service (Emergencies and Data Driven Stories)
It’s 9:54am in Torquay on a Wednesday morning: One appliance from Torquays fire station was mobilised to reports of a child with a potty seat stuck on its head. On arrival an undistressed two year old female was discovered with a toilet … Continue reading
Posted in opendata, Scrapers
Tagged data, html, javascript, open data, scraper, scrapers, views
Leave a comment
Fine set of graphs at the Office of National Statistics
It’s difficult to keep up. I’ve just noticed a set of interesting interactive graphs over at the Office of National Statistics (UK). If the world is about people, then the most fundamental dataset of all must be: Where are the … Continue reading
The Data Hob
Keeping with the baking metaphor, a hob is a projection or shelf at the back or side of a fireplace used for keeping food warm. The central part of a wheel into which the spokes are inserted looks kind of … Continue reading
The UN peacekeeping mission contributions mostly baked
Many of the most promising webscraping projects are abandoned when they are half done. The author often doesn’t know it. “What do you want? I’ve fully scraped the data,” they say. But it’s not good enough. You have to show … Continue reading
Posted in developer, journalism
3 Comments
Big fat aspx pages for thin data
My work is more with the practice of webscraping, and less in the high-faluting business plans and product-market-fit leaning agility. At the end of the day, someone must have done some actual webscraping — and the harder it is the … Continue reading
Posted in developer
2 Comments
Journalism Data Camp NY potential data sets
Here is a review of some of the datasets that have been submitted for the Columbia Journalism Data Camp this Friday. This list is only for backup in case not enough ideas show up with people on the day (never … Continue reading
Posted in developer, events, journalism
2 Comments