Hadoop 2.0 & OpenStack – PB&J ?

New developments are happening in the world Hadoop – even I had written a few blogs about it !
The latest blog from Arun has more insight and ideas on the scheduler and resource management …

What caught my attention was the fact that two of my worlds suddenly converged – the world of Hadoop and the world of OpenStack … And they go well together … like the proverbial PB and Jelly … !

Let us quickly look at a few synergies …

  • From a Hadoop Cluster to a Hadoop Cloud
    • The decoupling of application management & resource allocation opens up a host of possibilities from leveraging elasticity, enterprise extension and data adjacency. Now the Hadoop application (which BTW could be an acyclic graph of MapReduce jobs!) can ask for resources in a declarative manner and the Openstack cloud resource manager can allocate based on  the cloud infrastructure primitives
  • Another important capability is the policy based MapReduce for Multi-tenancy, compliance and even just resource leveling. The same MapReduce application graph can be run in multiple domains and can reflect the compliance security and other infrastructure considerations
  • Leverage Swift storage platform – for example cloudfiles is a great interface for managing data while Hadoop is good for processing data. Mind you, I am still working through the implications, but I can see an analytic cloud infrastructure combining the strength of swift and Hadoop.
    • One thought is to run the HDFS with it’s artifacts separately as part of the swift layer and then run the MapReduce elastically as required – but this will raise the data latency issue … nobody said this will be easy ;o)
  • And extending the thought further, an analytic cloud which combines the Hadoop 2.0, Openstack cloud platform and something like R is not that far off …

As you can see I can think of a few more ideas, and am sure you can too … The possibilities are interesting … But before we get ahead of us, there is pragmatic work ahead of us

  • Obviously a Hadoop-aware scheduler framework in nova is the first step
    • Need to figure out the best way to map the Application Manager, Resource manager, the node manager and the container Arun is talking about.
    • We also need to capture the declarative directives, resource requirements and application characteristics
  • Swift over HDFS & extending the data layer is next.
    • I really want to explore how we can address the latency effectively
    • And still manage a data cloud layer with tiered data storage matching the data lifecycle
    • And associated data services like replication, infrastructure redundancy, encryption and so forth
  • Maybe a Hadoop execution platform & a Hadoop PaaS over Openstack would be in the horizon …

Again, these are just my first impressions … What says thee ?


5 thoughts on “Hadoop 2.0 & OpenStack – PB&J ?

  1. Hi, I saw presentation by Arun yesterday on the “hadoop.new” which I guess equates to hadoop2 in your blog here. The aspects you mention make it compelling along with more flexible resource allocation (bye-bye slots..), versioning (which hadoop version does a job want to run on..) , and per-task management (task is wrapped in applicationmanager that manages lifecycle including any required retries).

    Where is the apache branch for this restructured/improved M/R? – i looked (and searched) all over svn.apache.org/repos/asf/hadoop ..

  2. hi,
    Im interested to know the pros and cons of the scenario of hadoop using swift as its storage directory for its input as well as output,please do provide you views regarding this .And it was known from many sources that using swift instead of hdfs in the case of hadoop is advantageous.If so please do share your knowledge whether it has been implemented any team.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s