-
Recent Posts
Our Tweets
- Energy news: using @scraperwiki to free data about oil fracing wells eenews.net/public/energyw… 1 day ago
- Microfinance in S Africa by @tlevine and #datakind plus themix.org an invisible market made more visible, bit.ly/LvmLoU 1 day ago
- Well done @tomtaylor (second place) and @colinwren (first place) for @scraperwiki prize at #nhshackday #nhshd 2 days ago
- Too many amazing projects at #nhshackday #nhshd. Exhausting judging them! 2 days ago
- Sitting on panel of diverse judges - doctors, journalists, technologists, former Chief Medical Officer at #nhshackday #nhshd 2 days ago
Find us on Facebook
Archives
Categories
Meta
Category Archives: developer
How to scrape and parse Wikipedia
Today’s exercise is to create a list of the longest and deepest caves in the UK from Wikipedia. Wikipedia pages for geographical structures often contain Infoboxes (that panel on the right hand side of the page). The first job was … Continue reading
ScraperWiki scrapers: now 53% more useful!
It’s Christmas come early at ScraperWiki HQ as we deliver—like elves popping boxes under the data digging Christmas tree—a bunch of great new improvements to the ScraperWiki site. We’ve been working on these for a while, so it’s great to … Continue reading
How to get along with an ASP webpage
Fingal County Council of Ireland recently published a number of sets of Open Data, in nice clean CSV, XML and KML formats. Unfortunately, the one set of Open Data that was difficult to obtain, was the list of sets of … Continue reading
Posted in developer, Scrapers
Tagged ASP, Fingal County Council, Ireland, scraperwiki, scraping
5 Comments
Job advert: Lead programmer
Oil wells, marathon results, planning applications… ScraperWiki is a Silicon Valley style startup, in the North West of England, in Liverpool. We’re changing the world of open data, and how data science is done together on the Internet. We’re looking for a … Continue reading
Lots of new libraries
We’ve had lots of requests recently for new 3rd party libraries to be accessible from within ScraperWiki. For those of you who don’t know, yes, we take requests for installing libraries! Just send us word on the feedback form and … Continue reading
Tweeting the drilling
A very long time ago I discovered the easiest webscraping target: the locations of all the North Sea Oil wells. Once you webcrawl through the index pages, the entries were pretty straightforward. There were dates, water depths (in feet or … Continue reading
Scraping guides: Dates and times
Working with dates and times in scrapers can get really tricky. So we’ve added a brand new scraping guide to the ScraperWiki documentation page, giving you copy-and-paste code to parse dates and times, and save them in the datastore. To … Continue reading
New backend now fully rolled out
The new faster, safer sandbox that powers ScraperWiki is now fully rolled out to all users. You should find running and developing scrapers and views faster than before, and that you’re using much more recent versions of Ruby, Python and associated … Continue reading
Scraping guides: Parsing HTML using CSS selectors
We’ve added a new scraping copy-and-paste guide, so you can quickly get the lines of code you need to parse an HTML file using CSS selectors. Get to it from the documentation page: The HTML parsing guide is available in Ruby, Python … Continue reading
Start Talking to Your Data – Literally!
Because ScraperWiki has a SQL database and an API with SQL extraction, I can SQL inject (haha!) straight into the API URL and use the JSON output. So what does all that mean? I scraped the CSV files of Special … Continue reading