MapReduce Cloud Stack – Part Deux

Last week I was thinking through one of our big data projects and focusing on Mapreduce infrastructure. Also had a few e-mail discussions with experts in this field. Finally came to a few pragmatics and couple of next actions.

First the pragmatics:

There is a difference between a language primitives and MapReduce. Mapreduce has an associated simplicity which also lends itself to solve a bigger set of problem domain. Hence, for now, no language “dryad-like” bindings
There are a few related projects including the widely used Pig & Hive. More relevant ones include Oozie[Link], Azkaban[Link] and Cascading[Link].
There is also the Whirr [Link] cloud framework as well as the jCloud that it relies on …

Second the next steps:

Opened couple of relevant JIRA tickets:
- WHIRR-118 is about leveraging the open cloud framework OpenStack.
- WHIRR-119 is a little more involved. For a Hadoop cloud one needs a control framework that carries a job end to end and also collect relevant data incl chargeback !
And there is the rack Combiner ticket which could potentially add data cloud primitives …
So there is enough stuff to start working on … Let us see where we get to in six months …

My missives

MapReduce Cloud Stack – Part Deux

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply