The Accidental History of Hadoop

Creative Commons: Attribution by Flickr User Efecto; Negativo

There are two very different types of collaboration; Intentional and Accidental.  Intentional collaboration is focused by a defined team with a shared purpose.  Interactions are marked by introductions, updates, “take a look at this”, “please review…” etc.  Most “collaboration” technology fits this category.  It is boring, line cook kitchen model collaboration.  Read the recipe. Gather the ingredients. Cook, plate and serve.  There is efficient repetition but little or no innovation.

So where do new recipes come from?

The answer is Accidental Collaboration.  Accidental collaboration is time and context shifted. It subverts or ignores original intent (of authors, findings, content or audience).  It finds new uses and applications for old information.  It is disruptive, innovative and amazing.  Examples in technology include re-blgging (tumblr, pinterest), reporting, content curation, re-use, re-purposing, re-search. When information is available and accessible new insights can occur.  This is because each new re-combination of content allows different features to emerge.  A collection of events in a city becomes a holiday schedule.  A collection of medical journal articles reveals a new drug delivery pathway.

A thread of an idea that started in 1676 with mathematician Leibniz can be traced through history to David Hilbert (1882), Alonzo Church (1936), John McCarthy (1958), Dean & Ghemawat (2004), and finally Doug Cutting (2006) who stands on the shoulders of these giants to create Hadoop.  Hadoop is at the center of the “Big Data” buzz.  Big data is all about deriving insight from huge amounts of disparate data.  It is accidental collaboration.

The original intent of the data is largely irrelevant.  It’s the data, and the availability of that data that is important. Leibniz wanted to create a language that could prove or disprove any proposition.  Hilbert came along to challenge that idea.  Church created Lambda calculus to prove that Hilbert’s challenge was actually unsolvable.  McCarthy used Lambda calculus to create LISP.  Dean and Ghemawat used LISP programming ideas to create MapReduce.  Cutting read their research and combined MapReduce with Lucene to create Hadoop.

Just as McCarthy never worked on a project team with Church to create LISP, the content Church created for Lambda Calculus was indispensible in helping McCarthy create the programming language.  Similarly, the ways in which LISP was created directly influenced Dean and Ghemawat at Google to create the map & reduce capabilities that allow massive distributed problem solving.  From that inspiration, a lot of hard work, and some help from Yahoo!, Hadoop was born.

The men involved, the content and approach were all in different eras, but they came together to create something special, innovative and impactful.  If that information was not available or accessible, Hadoop (and all the applications that rely upon it) would never have happened.

Accidental collaboration throughout history has been incredibly slow.  Modern information management technology like hadoop or active content archives can speed it up and deliver to us amazing insight in incredibly short periods of time.

This post originally appeared on on July 16

About billycripe @billycripe


  1. Trendspotting 2013 | BloomThink - December 19, 2012

    […] content and has a social intelligence baby. The match has already been made.  The sheer power of distributed compute systems like Hadoop when brought to bear on the sheer magnitude of social data produce amazing insights. But outside of […]

%d bloggers like this: