Author Archives: Thomas Levine

Data Business Models

If it sometimes feels like the data business is full of buzzwords and hipster technical jargon, then that’s probably because it is. But don’t panic! I’ve been at loads of hip and non-hip data talks here and there and, buzzwords … Continue reading

Aside | Posted on | Tagged , , , , , , , | Leave a comment

Hip Data Terms

“Big Data” and “Data Science” tend to be terms whose meaning is defined the moment they are used. They are sometimes meaningful, but their meaning is dependent on context. Through the agendas of many hip and not-so-hip data talks we … Continue reading

Posted in thoughts | Leave a comment

How to test shell scripts

Extreme hipster superheroes like me need tests for their shell. Here’s what’s available. YOLO: No automated testing Few shell scripts have any automated testing because shell programmers live life on the edge. Inevitably, this results in tedious manual ‘testing’. Loads … Continue reading

Posted in developer | 2 Comments

The Big Clean

I’m just about to return from Prague, Czech Republic, where I gave a workshop at the Big Clean. What a nice little conference this was! It had two tracks: Talks and the workshop. So I didn’t get to see many … Continue reading

Posted in events | 2 Comments

Party

I went to a three-day party in Buenos Aires this past month. The first two days were talks and workshops, I gave a talk on how awesome I am and a workshop on cleaning data. The latter involved no computers … Continue reading

Posted in events | 1 Comment

DumpTruck 0.0.3

I’ve added some new features to DumpTruck. Changes Dictionary case sensitivity I removed the dictionaries with case-insensitive keys because that just seemed to be delaying the conversion to case sensitivity. Ordered Dictionaries DumpTruck.execute now returns a collections.OrderedDict for each row … Continue reading

Posted in developer | Leave a comment

Do all “analysts” use Excel?

We were wondering how common spreadsheets are as a platform for data analysis. It’s not something I’ve really thought about in a while; I find it way easier to clean numbers with real programming languages. But we suspected that virtually … Continue reading

Posted in thoughts | 5 Comments

Twitter Scraper Python Library

I wanted to save the tweets from Transparency Camp. This prompted me to turn Anna‘s basic Twitter scraper into a library. Here’s how you use it. Import it. (It only works on ScraperWiki, unfortunately.) from scraperwiki import swimport search = … Continue reading

Posted in events, Scrapers | 3 Comments

Middle Names in the United States over Time

I was wondering what proportion of people have middle names, so I asked the Census. Recently you requested personal assistance from our on-line support center. Below is a summary of your request and our response. We will assume your issue … Continue reading

Posted in opendata, research, Scrapers | Tagged , | 1 Comment

Local ScraperWiki Library

It quite annoyed me that you can only use the scraperwiki library on a ScraperWiki instance; most of it could work fine elsewhere. So I’ve pulled it out (well, for Python at least) so you can use it offline. How … Continue reading

Posted in developer | Tagged , , | 4 Comments