Exciting News, Hadoop is evolving ! As I was reading Arun Murthy’s blog, multiple thoughts crossed my mind on the evolution of this lovable toy animal — My first impressions …
- From Data at Scale To Data at Scale + with complexity – connected & contextual
- From (relatively) static MapReduce To (somewhat) dynamic analytic platform
- While we might not see a real-time Hadoop soon, the proposed ideas do make the platform more dynamic
- The “support for alternate programming paradigms to MapReduce” by decoupling the the computation framework is an excellent idea
- I think it is still Mapreduce at the core (am not sure if it will deviate) but generic computation frameworks can choose their own patterns ! I am looking forward to BioInformatics applications
- The “Support for short-lived services” is interesting. I had blogged a little about this. Looking forward to how this evolves …
- I am hoping that it would be possible via extensible programming models to interface with programming systems like R.
- Embeddable, domain specific capabilities (for example algorithmics specific to bioinformatics) could be interesting
There are also a few things that might not be part of this evolution
- From Cluster to Cloud ?
- There is a proposed keynote by Dr. Todd Papaioannou/VP/Cloud Architecture at Yahoo, titled “Hadoop and the Future of Cloud Computing”.
- Actually I would prefer to see “Cloud Computing in the future of Hadoop” ;o) Had a blog few weeks ago … I was hoping for a project fluffy !
We need to move from a cluster to an elastic framework (from compute and storage prespective) – especially as Hadoop moves to an Analytic Platform. “The separation of management of resources in the cluster from management of the life cycle of applications and their component tasks results” is a good first step, now the resources can be instantiated via different mechanisms – cloud being the premier one
- Streamlined logging, monitoring and metering ?
- One of the challenges we are facing in our Unified Fabric Big Data project is that it is difficult to parse the logs and make inferences that help us to qualify & quantify MapReduce jobs.
- This also will help to create an analytic platform based on the Hadoop eco system. Now services like EMR, most probably do the second order metering by charging for the cloud infrastructure, as they spin separate VMs for every user (from my limited view)
In short, exciting times are ahead for Hadoop ! There is a talk tomorrow at the Bay Area HUG (Hadoop User Group) on this topic … plan to attend, and later contribute – this is exciting, cannot remain in the sidelines … Will blog on the points from tomorrow’s talk … [Update : HUG Presentations and Video Link]
I leave you with this picture from The Polar Express … time to jump aboard and enjoy the ride …