Archive by Author

Data Business Models

If it sometimes feels like the data business is full of buzzwords and hipster technical jargon, then that’s probably because it is. But don’t panic! I’ve been at loads of hip and non-hip data talks here and there and, buzzwords aside, I’ve come across four actual categories of data business model in this hip data […]

Hip Data Terms

“Big Data” and “Data Science” tend to be terms whose meaning is defined the moment they are used. They are sometimes meaningful, but their meaning is dependent on context. Through the agendas of many hip and not-so-hip data talks we could come up with some definitions some people mean, and will try and describe how […]

How to test shell scripts

Extreme hipster superheroes like me need tests for their shell. Here’s what’s available. YOLO: No automated testing Few shell scripts have any automated testing because shell programmers live life on the edge. Inevitably, this results in tedious manual ‘testing’. Loads of projects use this approach. git flow homeshick ievms rbenv z Here are some more. […]

A "BIG CLEAN" logo that looks like a logo for soap

The Big Clean

I’m just about to return from Prague, Czech Republic, where I gave a workshop at the Big Clean. What a nice little conference this was! It had two tracks: Talks and the workshop. So I didn’t get to see many of the talks :(. But this meant I had the whole day to teach people […]

Party

I went to a three-day party in Buenos Aires this past month. The first two days were talks and workshops, I gave a talk on how awesome I am and a workshop on cleaning data. The latter involved no computers and no slides, so I held it outside! I modeled an analog version of the […]

DumpTruck 0.0.3

I’ve added some new features to DumpTruck. Changes Dictionary case sensitivity I removed the dictionaries with case-insensitive keys because that just seemed to be delaying the conversion to case sensitivity. Ordered Dictionaries DumpTruck.execute now returns a collections.OrderedDict for each row rather than a dict for each row. Also, order is respected on insert, so you […]

Do all “analysts” use Excel?

We were wondering how common spreadsheets are as a platform for data analysis. It’s not something I’ve really thought about in a while; I find it way easier to clean numbers with real programming languages. But we suspected that virtually everyone else used spreadsheets, and specifically Excel Spreadsheet, so we did a couple of things […]

Twitter Scraper Python Library

I wanted to save the tweets from Transparency Camp. This prompted me to turn Anna‘s basic Twitter scraper into a library. Here’s how you use it. Import it. (It only works on ScraperWiki, unfortunately.) from scraperwiki import swimport search = swimport(‘twitter_search’).search Then search for terms. search(['picnic #tcamp12', 'from:TCampDC', '@TCampDC', '#tcamp12', '#viphack']) A separate search will […]

Local ScraperWiki Library

It quite annoyed me that you can only use the scraperwiki library on a ScraperWiki instance; most of it could work fine elsewhere. So I’ve pulled it out (well, for Python at least) so you can use it offline. How to use pip install scraperwiki_local You can then import scraperwiki in scripts run on your […]