-
Recent Posts
Our Tweets
- RT @CMCLRN: @ScraperWiki Give your neighbours a RT? Less than a week to go until the ICTD Event in Liverpool!! goo.gl/qTsEh #N… 2 days ago
- Nice to see Google Doodle celebrating a Liverpudlian: bit.ly/13xf3mX 3 days ago
- The basic "View in a table" tool on beta.scraperwiki.com is lots faster after lots of work today from the team 3 days ago
- Images and Domains – @frabcus' 4th piece on automatically summarising data: bit.ly/YT40VF 4 days ago
- An interesting question on @quora – "How do I become a data scientist?" b.qr.ae/17Z5a2b 4 days ago
Find us on Facebook
Archives
Categories
Meta
Author Archives: Thomas Levine
Data Business Models
If it sometimes feels like the data business is full of buzzwords and hipster technical jargon, then that’s probably because it is. But don’t panic! I’ve been at loads of hip and non-hip data talks here and there and, buzzwords … Continue reading
Aside
February 27, 2013
Tagged analysis, business models, data hub, excel, hadoop, nosql, R, storage
Leave a comment
Hip Data Terms
“Big Data” and “Data Science” tend to be terms whose meaning is defined the moment they are used. They are sometimes meaningful, but their meaning is dependent on context. Through the agendas of many hip and not-so-hip data talks we … Continue reading
Posted in thoughts
Leave a comment
How to test shell scripts
Extreme hipster superheroes like me need tests for their shell. Here’s what’s available. YOLO: No automated testing Few shell scripts have any automated testing because shell programmers live life on the edge. Inevitably, this results in tedious manual ‘testing’. Loads … Continue reading
Posted in developer
2 Comments
The Big Clean
I’m just about to return from Prague, Czech Republic, where I gave a workshop at the Big Clean. What a nice little conference this was! It had two tracks: Talks and the workshop. So I didn’t get to see many … Continue reading
Posted in events
2 Comments
Party
I went to a three-day party in Buenos Aires this past month. The first two days were talks and workshops, I gave a talk on how awesome I am and a workshop on cleaning data. The latter involved no computers … Continue reading
DumpTruck 0.0.3
I’ve added some new features to DumpTruck. Changes Dictionary case sensitivity I removed the dictionaries with case-insensitive keys because that just seemed to be delaying the conversion to case sensitivity. Ordered Dictionaries DumpTruck.execute now returns a collections.OrderedDict for each row … Continue reading
Posted in developer
Leave a comment
Do all “analysts” use Excel?
We were wondering how common spreadsheets are as a platform for data analysis. It’s not something I’ve really thought about in a while; I find it way easier to clean numbers with real programming languages. But we suspected that virtually … Continue reading
Posted in thoughts
5 Comments
Twitter Scraper Python Library
I wanted to save the tweets from Transparency Camp. This prompted me to turn Anna‘s basic Twitter scraper into a library. Here’s how you use it. Import it. (It only works on ScraperWiki, unfortunately.) from scraperwiki import swimport search = … Continue reading
Posted in events, Scrapers
3 Comments
Middle Names in the United States over Time
I was wondering what proportion of people have middle names, so I asked the Census. Recently you requested personal assistance from our on-line support center. Below is a summary of your request and our response. We will assume your issue … Continue reading
Local ScraperWiki Library
It quite annoyed me that you can only use the scraperwiki library on a ScraperWiki instance; most of it could work fine elsewhere. So I’ve pulled it out (well, for Python at least) so you can use it offline. How … Continue reading