Archive by Author

Digitally enhanced social research

Guest post by Dr Rebecca Sandover. The continued expansion of social media activity raises many questions of how this ever-changing digital life spreads ideas and how ‘contagious’ online events arise. Exeter University’s Contagion project has been running since September 2013, funded by the UK Economics and Social Research Council to explore how such events spread […]

NewsReader – one year on

ScraperWiki has been contributing to NewsReader, an EU FP7 project, for over a year now. In that time, we’ve discovered that all the TechCrunch articles would make a pile 4 metres high, and that’s just one relatively small site. The total volume of news published everyday is enormous but the tools we use to process it […]

G8 Face recognition demo

Face ReKognition

I’ve previously written about social media and the popularity of our Twitter Search and Followers tools. But how can we make Twitter data more useful to our customers? Analysing the profile pictures of Twitter accounts seemed like an interesting thing to do since they are often the faces of the account holder and a face […]

Hadoop in Action book cover

Book review: Hadoop in Action by Chuck Lam

Hadoop in Action by Chuck Lam provides a brief, fairly technical introduction to the Hadoop Big Data ecosystem. Hadoop is an open source implementation of the MapReduce framework originally developed by Google to process huge quantities of web search data. The name MapReduce, refers to dividing up jobs amongst multiple processors (“Mapping”) and then recombining […]

socail media collage

Getting sociable

The Search for Tweets and Get Twitter followers tools are the most popular on our platform. Why is this? In part this is because we’re sociable creatures; platforms like Twitter get a lot of interaction time from a lot of people. A certain section of the population has a data packrat mentality. For them ScraperWiki […]

Data Mining Cover

Book review: Data Mining – Practical Machine Learning Tools and Techniques by Witten, Frank and Hall

I’ve been doing more reading on machine learning, this time in the form of Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten, Eibe Frank and Mark A. Hall. This comes by recommendation of my academic colleagues on the Newsreader project, who rely heavily on machine learning techniques to do natural language […]