Archive by Author

Horizon 2020–Project TIMON

ScraperWiki are members of a new EU Horizon 2020 project: TIMON “Enhanced real time services for optimized multimodal mobility relying on cooperative networks and open data”. This is a 3.5 year project, that commenced in June 2015, whose objectives are: to improve road safety; to provide greater transport flexibility in terms of journey planning across multiple modes […]

Book review: Docker Up & Running by Karl Matthias and Sean P. Kane

This last week I have been reading Docker Up & Running by Karl Matthias and Sean P. Kane, a newly published book on Docker – a container technology which is designed to simplify the process of application testing and deployment. Docker is a very new product, first announced in March 2013, although it is based […]

Book Review: Learning Spark by Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia

Apache Spark is a system for doing data analysis which can be run on a single machine or across a cluster, it  is pretty new technology – initial work was in 2009 and Apache adopted it in 2013. There’s a lot of buzz around it, and I have a problem for which it might be […]

Book review: Mastering Gephi Network Visualisation by Ken Cherven

A little while ago I reviewed Ken Cherven’s book Network Graph Analysis and Visualisation with Gephi, it’s fair to say I was not very complementary about it. It was rather short, and had quite a lot of screenshots. It’s strength was in introducing every single element of the Gephi interface. This book, Mastering Gephi Network […]

Book review: Cryptocurrency by Paul Vigna and Michael J. Casey

Amongst hipster start ups in the tech industry Bitcoin has been a thing for a while. As one of the more elderly members of this community I wanted to understand a bit more about it. Cryptocurrency: How Bitcoin and Digital Money are Challenging the Global Economic Order by Paul Vigna and Michael Casey fits this […]

A tool to help with your next job move

A guest post from Jyl Djumalieva. During February and March this year I had a wonderful opportunity to share the workspace with ScraperWiki team. As an aspiring data analyst, I found it very educational to learn how real-life data science happens. After observing ScraperWiki data scientist do some analytical heavy lifting I was inspired to embark […]

Adventures in Kaggle: Forest Cover Type Prediction

Regular readers of this blog will know I’ve read quite few machine learning books, now to put this learning into action. We’ve done some machine learning for clients but I thought it would be good to do something I could share. The Forest Cover Type Prediction challenge on Kaggle seemed to fit the bill. Kaggle […]

Book review: How Linux works by Brian Ward

A break since my last book review since I’ve been coding, rather than reading, on the commute into the ScraperWiki offices in Liverpool. Next up is How Linux Works by Brian Ward. In some senses this book follows on from Data Science at the Command Line by Jeroen Janssens. Data Science was about doing analysis […]

Book review: Data Science at the Command Line by Jeroen Janssens

In the mixed environment of ScraperWiki we make use of a broad variety of tools for data analysis. Data Science at the Command Line by Jeroen Janssens covers tools available at the Linux command line for doing data analysis tasks. The book is divided thematically into chapters on Obtaining, Scrubbing, Modeling, Interpreting Data with “intermezzo” […]

We're hiring!