-
Recent Posts
Our Tweets
- $1 million to build a data platform - @scraperwiki closes new round of investment http://t.co/7KWdjVbb 16 hours ago
- If you are planning to come to #jdcdc at The Washington Post please just take 1 ticket - the streams all merge after the training sessions! 19 hours ago
- Blog by @slinkp23 on how he scraped public school budgets at #jdcny last Friday... and ate food! http://t.co/dEHoF1h9 1 day ago
- #Props to Dragon for #scraping #salaries of #SouthDakota #lobbyists https://t.co/s739Dbu9 2 days ago
- The extraordinary story of scraping New York State lobbying data from a crazy .aspx page http://t.co/UJjTCAo1 2 days ago
Find us on Facebook
Archives
Categories
Meta
Author Archives: Francis Irving
$1 million to build a data platform
Sometimes the easiest way of being authentic is to just post an email that was written to be private… Date: Fri, 27 Jan 2012 14:29:57 +0000 From: Francis Irving <francis@scraperwiki.com> To: team@scraperwiki.com Subject: Capital! Today we closed our round of … Continue reading
Posted in business
Leave a comment
ScraperWikiをためしてみよう
Guest post by Makoto Inoue, a Japanese ScraperWiki user. Makoto works in London as a Web developer, a technical writer, and a translator. He has a Japanese blog and his Twitter account is @makoto_inoue. はじめに みなさんスクレイプ(Scrape)という単語はご存知でしょうか? ウェッブページから特定のデータを引っこ抜く作業のことをスクレイピング(Scraping)と呼びます。 昨今のホームページではデータを簡単に提供するためのAPI(Application Programming Interface)というしくみが多いので「なんで今更そんなの必要なの」と思われる方>も多いかもしれません。しかしながら前回起きた東日本大地震の際、地震や電力の速報や、各地の被害状況を把握するために必要な政府の統計情報などがAPIとして提供されておらず、開発者の中には自分でスクレイパー(Scraper)用のプログラムを書いた人も多いのではないのでしょうか? ただそういった多くの開発者の善意でつくられたプログラムがいろいろなサイトに散らばっていたり、やがてメンテナンスされなくなるのは非常に残念なことです。 そういうときにScraperWikiの出番です。 ScraperWikiとは ScraperWikiはイギリスのスタートアップ企業で、スクレイパーコードを共有するサイトを提供しています。開発者達はサイト上から直接コード(Ruby, PHP, Python)を編集、実行することができます。スクレイプを定期的に実行することも可能で、取得されたデータはScraperWikiに保存されますが、ScraperWikiはAPIを用意しているので、このAPIを通して、他のサイトでデータを再利用することが可能です。 … Continue reading
Up in the Air with ScraperWiki and Tropo
We came across this blog post a few days ago from these cool guys at Tropo in Florida, and thought you’d be interested in how they’ve used ScraperWiki. Tropo is a simple API for adding voice and other goodies to … Continue reading
Our friendly competitors / partners
I made this diagram a few months ago now (for VCs), that shows the world of online data collaboration and scraping from a ScraperWiki point of view. It shows the kind of companies and technologies that, if ScraperWiki were to not … Continue reading
Job advert: Lead programmer
Oil wells, marathon results, planning applications… ScraperWiki is a Silicon Valley style startup, in the North West of England, in Liverpool. We’re changing the world of open data, and how data science is done together on the Internet. We’re looking for a … Continue reading
Lots of new libraries
We’ve had lots of requests recently for new 3rd party libraries to be accessible from within ScraperWiki. For those of you who don’t know, yes, we take requests for installing libraries! Just send us word on the feedback form and … Continue reading
Scraping guides: Dates and times
Working with dates and times in scrapers can get really tricky. So we’ve added a brand new scraping guide to the ScraperWiki documentation page, giving you copy-and-paste code to parse dates and times, and save them in the datastore. To … Continue reading
New backend now fully rolled out
The new faster, safer sandbox that powers ScraperWiki is now fully rolled out to all users. You should find running and developing scrapers and views faster than before, and that you’re using much more recent versions of Ruby, Python and associated … Continue reading
Scraping guides: Parsing HTML using CSS selectors
We’ve added a new scraping copy-and-paste guide, so you can quickly get the lines of code you need to parse an HTML file using CSS selectors. Get to it from the documentation page: The HTML parsing guide is available in Ruby, Python … Continue reading
Four data trends to rule them all, the data scientist king to bind them
My favourite soundbite from O’Reilly’s Strata data conference was a definition of big data. John Rauser, Amazon’s main data scientist, said to me that “data is big data when you can’t process it on one machine”. And naturally, small data is … Continue reading