Blog

Is scraping legal?

Lots of people, when they hear about ScraperWiki, ask “is scraping legal? how can you build a business off that?”. Usually to follow up by saying “we do it in our company, but we would never tell anyone”.

This is strange to us, as we have come from a world of good scraping. Taking Government data, and making it easier for people to use for things that benefit all of society. We’re in favour of that kind of scraping.

It’s obviously a spectrum. At the other extreme, the most evil scraping would be to steal content that somebody else sells, and then to republish it at harm to their business. We’re against that kind of scraping.

It’s not scraping itself which is good or bad, or legal or illegal, but the circumstances in which you’re doing it.

We’ve written up in full our policy about the legality, it’s in our FAQ under ‘What’s your policy on what’s legal to scrape?‘. Lots of details about robots.txt and take down notices, and what is our and your legal responsibility.

Finally, ScraperWiki isn’t just about scraping.

We’re a data hub, and you need to get data into a data hub. As well as scraping, lots of people make API calls to do that on ScraperWiki, or download their own files from their own servers.

This is much more profound than it sounds – when you are using data for a new purpose, even if it is already structured, you still need to get it and convert it to your new needs. How you do that is a detail that depends on the circumstances.

The difference between parsing HTML web pages, and using a JSON REST API is surprisingly small. As an example, Thomas scraped EventBrite even though it has an API (see the post at the end of that thread by Ryan who works at EventBrite!), because it was easier at the time for him.

What matters is getting the data, and converting it into a form where it can do something useful for the world. And doing that legally. Whether you’re using Nokogiri or Nestful.

12 Responses to “Is scraping legal?”

  1. jtownend April 3, 2012 at 8:46 am #

    Reblogged this on Media law and ethics and commented:

    ScraperWiki is a Liverpool-based data tools service and community I did some work for in 2010/11 and a winner of the Knight News Challenge 2011. In this post, its CEO Francis Irving looks at the legal issues around screen scraping.

  2. Joel April 5, 2012 at 12:33 pm #

    Very interesting post.
    In my opinion, if data is publicly viewable / indexed by search engines, expect it to be scraped. There are ways to prevent scraping from happening, and if one really wants scraping of data to be stopped, they should implement various methods to within their website/service.

  3. Software Engineer April 7, 2012 at 10:49 pm #

    Although I understand the concern I find this question absurd considering the leading companies in technology make regular use of scraping. Google is nothing more than a very large scraping service that scours the internet for keywords. Need I say more?

  4. Mohit Sharma August 12, 2013 at 1:39 pm #

    Is viewing a website/site’s html in a browser illegal? Here’s our take on legality of web crawling/scraping – http://blog.promptcloud.com/2013/01/is-crawling-legal.html

Trackbacks/Pingbacks

  1. As unstructured data heats up, will you need a license to webcrawl? — Cloud Computing News - April 23, 2012

    [...] sites, companies might be pushing the boundaries of polite (or ethical) behavior. They may also be stealing valuable IP. So is it stoppable and could the current solutions lead to the demise of the open [...]

  2. As unstructured data heats up, will you need a license to webcrawl? | Apple Related - April 23, 2012

    [...] sites, companies might be pushing the boundaries of polite (or ethical) behavior. They may also be stealing valuable IP. So is it stoppable and could the current solutions lead to the demise of the open [...]

  3. GIASTAR – Storie di ordinaria tecnologia » Blog Archive » As unstructured data heats up, will you need a license to webcrawl? - April 23, 2012

    [...] sites, companies might be pushing the boundaries of polite (or ethical) behavior. They may also be stealing valuable IP. So is it stoppable and could the current solutions lead to the demise of the open [...]

  4. Noah Zimmerman » As unstructured data heats up, will you need a license to webcrawl? - April 23, 2012

    [...] sites, companies might be pushing the boundaries of polite (or ethical) behavior. They may also be stealing valuable IP. So is it stoppable and could the current solutions lead to the demise of the open [...]

  5. As unstructured data heats up, will you need a license to webcrawl? | TechDiem.com - April 23, 2012

    [...] sites, companies might be pushing the boundaries of polite (or ethical) behavior. They may also be stealing valuable IP. So is it stoppable and could the current solutions lead to the demise of the open [...]

  6. Web Design: If I wanted to create a site that compared the price of ... lets say pencils. Could I legally scrape sites like Staples, and Walmart to get different pencil prices? - Quora - October 26, 2012

    [...] I once tried to start a legal information startupI've written a blogs post about this:http://blog.scraperwiki.com/2…Comment Loading… • Share • Embed • Just now  Add [...]

  7. Programmier-Crashkurs für Journalisten - UNIVERSALCODE - December 1, 2012

    [...] händischen Auswertung nur, die Informationen auf Webseiten mit einem Programm auszulesen und – unter Beachtung der rechtlichen Hintergründe – weiterzuverarbeiten beziehungsweise in einer eigenen Datenbank abzuspeichern. Außerdem sind [...]

  8. wifi phones: PowerGen Dual Port USB 2.1A 10W AC Travel Wall Charger – White - February 23, 2013

    [...] Source Page: http://blog.scraperwiki.com/2012/04/02/is-scraping-legal/ [...]