Last week I was thinking through one of our big data projects and focusing on Mapreduce infrastructure. Also had a few e-mail discussions with experts in this field. Finally came to a few pragmatics and couple of next actions.
First the pragmatics:
- There is a difference between a language primitives and MapReduce. Mapreduce has an associated simplicity which also lends itself to solve a bigger set of problem domain. Hence, for now, no language “dryad-like” bindings
- There are a few related projects including the widely used Pig & Hive. More relevant ones include Oozie[Link], Azkaban[Link] and Cascading[Link].
- There is also the Whirr [Link] cloud framework as well as the jCloud that it relies on …
Second the next steps:
- Opened couple of relevant JIRA tickets:
- And there is the rack Combiner ticket which could potentially add data cloud primitives …
- So there is enough stuff to start working on … Let us see where we get to in six months …