Hadoop NextGen – From a Framework To a Big Data Analytics Platform

Exciting News, Hadoop is evolving ! As I was reading Arun Murthy’s blog, multiple thoughts crossed my mind on the evolution of this lovable toy animal — My first impressions …

  • From Data at Scale To Data at Scale + with complexity – connected & contextual
    • This, I think is the essence – from generic computation framework to scalability, with the new Hadoop platform we can process data at scale and with complexity – connected & contextual. For example the Watson Jeopardy dataset [Link] [Link]
  • From (relatively) static MapReduce To (somewhat) dynamic analytic platform
    • While we might not see a real-time Hadoop soon, the proposed ideas do make the platform more dynamic
    • The “support for alternate programming paradigms to MapReduce” by decoupling the the computation framework is an excellent idea
    • I think it is still Mapreduce at the core (am not sure if it will deviate) but generic computation frameworks can choose their own patterns ! I am looking forward to BioInformatics applications
    • The “Support for short-lived services” is interesting. I had blogged a little about this. Looking forward to how this evolves …
    • I am hoping that it would be possible via extensible programming models to interface with programming systems like R.
    • Embeddable, domain specific capabilities (for example algorithmics specific to bioinformatics) could be interesting

There are also a few things that might not be part of this evolution

  • From Cluster to Cloud ?
    • There is a proposed keynote by Dr. Todd Papaioannou/VP/Cloud Architecture at Yahoo, titled “Hadoop and the Future of Cloud Computing”.
    • Actually I would prefer to see “Cloud Computing in the future of Hadoop” ;o) Had a blog few weeks ago … I was hoping for a project fluffy !

      We need to move from a cluster to an elastic framework (from compute and storage prespective) – especially as Hadoop moves to an Analytic Platform. “The separation of management of resources in the cluster from management of the life cycle of applications and their component tasks results” is a good first step, now the resources can be instantiated via different mechanisms – cloud being the premier one
  • GPU
    • In the context of my coursework at JHU (BioInformatics) had a couple of talks with the folks working on DataScope. They plan to run Hadoop as one of the applications in their GPU cluster !
    • GPU computing is accelerating, and capability for Hadoop to run on GPU cluster would be interesting
  • Streamlined logging, monitoring and metering ?
    • One of the challenges we are facing in our Unified Fabric Big Data project is that it is difficult to parse the logs and make inferences that help us to qualify & quantify MapReduce jobs.
    • This also will help to create an analytic platform based on the Hadoop eco system. Now services like EMR, most probably do the second order metering by charging for the cloud infrastructure, as they spin separate VMs for every user (from my limited view)

In short, exciting times are ahead for Hadoop ! There is a talk tomorrow at the Bay Area HUG (Hadoop User Group) on this topic … plan to attend, and later contribute – this is exciting, cannot remain in the sidelines … Will blog on the points from tomorrow’s talk … [Update : HUG Presentations and Video Link]

I leave you with this picture from The Polar Express … time to jump aboard and enjoy the ride …


4 thoughts on “Hadoop NextGen – From a Framework To a Big Data Analytics Platform

  1. Pingback: Tweets that mention Hadoop NextGen – From a Framework To a Big Data Analytics Platform « My missives -- Topsy.com

  2. Pingback: Analytics Clouds – The Drums that Talk « My missives

  3. Pingback: Hadoop 2.0 & OpenStack – PB&J ? « My missives

  4. Pingback: Waypoints From Big Data to Smart Data « My missives

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s