-
Recent Posts
Our Tweets
- $1 million to build a data platform - @scraperwiki closes new round of investment http://t.co/7KWdjVbb 16 hours ago
- If you are planning to come to #jdcdc at The Washington Post please just take 1 ticket - the streams all merge after the training sessions! 19 hours ago
- Blog by @slinkp23 on how he scraped public school budgets at #jdcny last Friday... and ate food! http://t.co/dEHoF1h9 1 day ago
- #Props to Dragon for #scraping #salaries of #SouthDakota #lobbyists https://t.co/s739Dbu9 2 days ago
- The extraordinary story of scraping New York State lobbying data from a crazy .aspx page http://t.co/UJjTCAo1 2 days ago
Find us on Facebook
Archives
Categories
Meta
Category Archives: developer
Big fat aspx pages for thin data
My work is more with the practice of webscraping, and less in the high-faluting business plans and product-market-fit leaning agility. At the end of the day, someone must have done some actual webscraping — and the harder it is the … Continue reading
Journalism Data Camp NY potential data sets
Here is a review of some of the datasets that have been submitted for the Columbia Journalism Data Camp this Friday. This list is only for backup in case not enough ideas show up with people on the day (never … Continue reading
Posted in developer, events, journalism
2 Comments
How to stop missing the good weekends
Far too often I get so stuck into the work week that I forget to monitor the weather for the weekend when I should be going off to play on my dive kayaks — an activity which is somewhat weather … Continue reading
Posted in developer, Scrapers
Tagged alerts, email alerts, emails, python, scraperwiki, weather
Leave a comment
ScraperWikiをためしてみよう
Guest post by Makoto Inoue, a Japanese ScraperWiki user. Makoto works in London as a Web developer, a technical writer, and a translator. He has a Japanese blog and his Twitter account is @makoto_inoue. はじめに みなさんスクレイプ(Scrape)という単語はご存知でしょうか? ウェッブページから特定のデータを引っこ抜く作業のことをスクレイピング(Scraping)と呼びます。 昨今のホームページではデータを簡単に提供するためのAPI(Application Programming Interface)というしくみが多いので「なんで今更そんなの必要なの」と思われる方>も多いかもしれません。しかしながら前回起きた東日本大地震の際、地震や電力の速報や、各地の被害状況を把握するために必要な政府の統計情報などがAPIとして提供されておらず、開発者の中には自分でスクレイパー(Scraper)用のプログラムを書いた人も多いのではないのでしょうか? ただそういった多くの開発者の善意でつくられたプログラムがいろいろなサイトに散らばっていたり、やがてメンテナンスされなくなるのは非常に残念なことです。 そういうときにScraperWikiの出番です。 ScraperWikiとは ScraperWikiはイギリスのスタートアップ企業で、スクレイパーコードを共有するサイトを提供しています。開発者達はサイト上から直接コード(Ruby, PHP, Python)を編集、実行することができます。スクレイプを定期的に実行することも可能で、取得されたデータはScraperWikiに保存されますが、ScraperWikiはAPIを用意しているので、このAPIを通して、他のサイトでデータを再利用することが可能です。 … Continue reading
Scraping the protests with Goldsmiths
Zarino here, writing from carriage A of the 10:07 London-to-Liverpool (the wonders of the Internet!). While our new First Engineer, drj, has been getting to grips with lots of the under-the-hood changes which’ll make ScraperWiki a lot faster and more … Continue reading
Posted in developer, events, journalism
Tagged API, google maps, occupy, protests, views, wikipedia
1 Comment
How to scrape and parse Wikipedia
Today’s exercise is to create a list of the longest and deepest caves in the UK from Wikipedia. Wikipedia pages for geographical structures often contain Infoboxes (that panel on the right hand side of the page). The first job was … Continue reading
ScraperWiki scrapers: now 53% more useful!
It’s Christmas come early at ScraperWiki HQ as we deliver—like elves popping boxes under the data digging Christmas tree—a bunch of great new improvements to the ScraperWiki site. We’ve been working on these for a while, so it’s great to … Continue reading
How to get along with an ASP webpage
Fingal County Council of Ireland recently published a number of sets of Open Data, in nice clean CSV, XML and KML formats. Unfortunately, the one set of Open Data that was difficult to obtain, was the list of sets of … Continue reading
Posted in developer, Scrapers
Tagged ASP, Fingal County Council, Ireland, scraperwiki, scraping
5 Comments
Job advert: Lead programmer
Oil wells, marathon results, planning applications… ScraperWiki is a Silicon Valley style startup, in the North West of England, in Liverpool. We’re changing the world of open data, and how data science is done together on the Internet. We’re looking for a … Continue reading
Lots of new libraries
We’ve had lots of requests recently for new 3rd party libraries to be accessible from within ScraperWiki. For those of you who don’t know, yes, we take requests for installing libraries! Just send us word on the feedback form and … Continue reading