MapReduce Cloud Stack – Part Deux


Last week I was thinking through one of our big data projects and focusing on Mapreduce infrastructure. Also had a few e-mail discussions with experts in this field. Finally came to a few pragmatics and couple of next actions.

First the pragmatics:

  • There is a difference between a language primitives and MapReduce. Mapreduce has an associated simplicity which also lends itself to solve a bigger set of problem domain. Hence, for now, no language “dryad-like” bindings
  • There are a few related projects including the widely used Pig & Hive. More relevant ones include Oozie[Link], Azkaban[Link] and Cascading[Link].
  • There is also the Whirr [Link] cloud framework as well as the jCloud that it relies on …

Second the next steps:

  • Opened couple of relevant JIRA tickets:
    • WHIRR-118 is about leveraging the open cloud framework OpenStack.
    • WHIRR-119 is a little more involved. For a Hadoop cloud one needs a control framework that carries a job end to end and also collect relevant data incl chargeback !
  • And there is the rack Combiner ticket which could potentially add data cloud primitives …
  • So there is enough stuff to start working on … Let us see where we get to in six months …

Leave a comment