Archive | Developer

Elasticsearch and elasticity: building a search for government documents

Based in Paris, the OECD is the Organisation for Economic Co-operation and Development. As the name suggests, the OECD’s job is to develop and promote new social and economic policies. One part of their work is researching how open countries trade. Their view is that fewer trade barriers benefit consumers, through lower prices, and companies, […]

Four specific things “agile” saved us from doing at ONS

There’s lots of both hype and cynicism around “agile”. Instead, look at this part of the original agile declaration. We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: … Responding to change over Following a plan That is, while there […]

Book review: How Linux works by Brian Ward

A break since my last book review since I’ve been coding, rather than reading, on the commute into the ScraperWiki offices in Liverpool. Next up is How Linux Works by Brian Ward. In some senses this book follows on from Data Science at the Command Line by Jeroen Janssens. Data Science was about doing analysis […]

NewsReader – Hack 100,000 World Cup Articles

June 10, The Hub Westminster (@NewsReader) Ian Hopkinson has been telling you about our role in the NewsReader project.  We’re making a thing that crunches large volumes of news articles.  We’re combining natural language processing and semantic web technology.  It’s an FP7 project so we’re working with a bunch of partners across Europe. We’re 18 […]

Scraperwiki’s response to the Heartbleed security failure

Et tu, Heartbleed “Catastrophic” is the right word. On the scale of 1 to 10, this is an 11. ― Security expert, Bruce Schneier, responds to Heartbleed On Monday the 7th of April 2014, a software flaw was identified which exposed approximately two thirds of the web to the risk of catastrophic security failure. The flaw has […]

Scraping Spreadsheets with XYPath

Spreadsheets are great. They’re ubiquitously available, beaten only by the web pages and the word processor documents. Like the word processor, they’re easy to use and give the user a blank page, but they divide the page up into cells to make sure that the columns and rows all line up. And unlike more complicated […]

Underneath the hood of Government’s Performance Platform

In the previous post I described what the UK Government’s new Performance Platform (made by GDS) is for. Today’s question is, how does it work? I’ve found out two ways. Firstly, thanks to Alex Muller from GDS, who talked me through the platform. Secondly, all the code is freely available on Github, which is pretty. Component parts There […]

Git!

As software company, use of some sort of software source control system is inevitable, indeed our CEO wrote TortoiseCVS – a file system overlay for the early CVS source control system. For those uninitiated in the joys of software engineering: source control is a system for recording the history of file revisions allowing programmers to […]

It’s good to share…

As you may have gathered I’m on a journey, I’ve worked as a physicist, a data scientist for 20 years and now I’ve fallen amongst software engineers. There are obvious similarities in what we do, we write code to do stuff. I write code to analyse things and the software engineers write code to do […]