Category Archives: Scrapers

About the scrapers our users have made

Tools of the trade

With the experience of a whole week of ScraperWiki, I am starting to appreciate the core tools of the professional Data Scientist. In the past I’ve written scrapers in Matlab, C# and Python. However, the house language for scraping at … Continue reading

Posted in Scrapers | 1 Comment

WordPress Titles: scraping with search url

I’ve blogged for a few years now, and I’ve used several tools along the way. zachbeauvais.com began as a Drupal site, until I worked out that it’s a bit overkill, and switched to WordPress. Recently, I’ve been toying with the … Continue reading

Posted in Scrapers | Tagged , , , , | 1 Comment

My time at the Autocloud

The global CADCAM behemoth known as Autodesk hoovers up another small company every two weeks — a process unlikely to diminish following a $750million bond issue last month. (Well, what else are they going to do with that money?) It … Continue reading

Posted in business, Scrapers | 2 Comments

Scraping the Royal Society membership list

To a data scientist any data is fair game, from my interest in the history of science I came across the membership records of the Royal Society from 1660 to 2007 which are available as a single PDF file. I’ve … Continue reading

Posted in Scrapers | Tagged , | 2 Comments

On-line directory tree webscraping

As you surf around the internet — particularly in the old days — you may have seen web-pages like this: or this: The former image is generated by Apache SVN server, and the latter is the plain directory view generated … Continue reading

Posted in Scrapers | 1 Comment

Twitter Scraper Python Library

I wanted to save the tweets from Transparency Camp. This prompted me to turn Anna‘s basic Twitter scraper into a library. Here’s how you use it. Import it. (It only works on ScraperWiki, unfortunately.) from scraperwiki import swimport search = … Continue reading

Posted in events, Scrapers | 3 Comments

Middle Names in the United States over Time

I was wondering what proportion of people have middle names, so I asked the Census. Recently you requested personal assistance from our on-line support center. Below is a summary of your request and our response. We will assume your issue … Continue reading

Posted in opendata, research, Scrapers | Tagged , | 1 Comment

Microfinance Data Scraping

I went to the Datakind‘s New York Datadive last November and met the Microfinance Information Exchange (MIX), a group that ‘delivers data services, analysis, research and business information on the institutions that provide financial services to the world’s poor’. They wanted to see whether web-scraping could … Continue reading

Posted in opendata, research, Scrapers | Leave a comment

5 yr old goes ‘potty’ at Devon and Somerset Fire Service (Emergencies and Data Driven Stories)

It’s 9:54am in Torquay on a Wednesday morning: One appliance from Torquays fire station was mobilised to reports of a child with a potty seat stuck on its head. On arrival an undistressed two year old female was discovered with a toilet … Continue reading

Posted in opendata, Scrapers | Tagged , , , , , , | Leave a comment

Handling exceptions in scrapers

When requesting and parsing data from a source with unknown properties and random behavior (in other words, scraping), I expect all kinds of bizarrities to occur. Managing exceptions is particularly helpful in such cases. Here is some ways that an … Continue reading

Posted in Scrapers | 2 Comments