It is 2013 & Big Data is big news … Time to revisit my older (Nov’11) blog “Top 10 Steps to A Pragmatic Big Data Pipeline” … Some things have changed but many have remained the same …
5. Chuck the hype, embrace the concept …
This seems to the first obvious step for organizations. From Ed Dumbill (“Big data” is an imprecise term...) to TechCrunch (“Perhaps it’s about the actual functionality of apps vs. the data“) agree with the concept, but the terms and marketing hypes have hit the proverbial roof. The point is, there are many ponies this pile & there is tremendous business value (so long as one is willing to discount the hype and think Big Data = All Data) …
I really like Mike Gualtieri’s very insightful definition of Big Data as
… the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers

4. Don’t implement a Technology, implement THE Big Data pipeline
Think of Big Data in multiple dimensions than a point technology & evolve the pipeline focussing on all the aspects of the stages

The technologies, the skill sets and the tools are evolving, so are the business requirements.
Chris Taylor addresses this very clearly (“Big Data must not be an elephant riding a bicycle“) – viz. One has to address the entire spectrum to get value …
Simply applying distributed storage and processing (like Hadoop) to extremely large data sets is like putting an elephant on a bicycle .. it just doesn’t make business sense — Chris Taylor
3. Think Hybrid – Big Data Apps, Appliances & Infrastructure
I had addressed this one in my earlier blog(“Big Data Borgs, Rise of the Big Data Machines & Revenge of the Fallen Algorithms“)
The morale of the story : Think out-of-the box & inside-the-box.
Match the impedence of the use cases with appropriate technologies
2. Tell your stories, leveraging smart data, based on crisp business use cases & requirements
Evolve the systems incrementally focussing on the business values that determine the stories to tell, the inferences to derive, the feature sets to influence & the recommendations to make
Augment, not replace the current BI systems
Notice the comma (I am NOT saying “Augment not, Replace”!)
“Replace Teradata with Hadoop” is not a valid use case, given the current state of the technologies. In fact, integration with BI is an interesting challenge for Big Data …
No doubt Hadoop & NOSQL can add a lot of value, but make the case for co-existence leveraging currently installed technologies & skill set. Products like Hive also minimizes barrier to entry for folks who are familiar with SQL
From a business perspective Patrick Keddy of Iron Mountain has a few excellent suggestions on managing Big Data:
Big data informs and enhances judgement and intuition, it should not replace them
Opt for progress over perfection
View the data in context
1. Apply the art of Data Science & Smart Data, paying attention to touch points
This still remains my #1. Data Science is the key differentiator resulting in new insights, new products, order of magnitude performance, new customer base et al – “a cohesive narrative from the numbers & statistics”
“Data science is about trying to create a process that allows you to create new ways of thinking about problems that are novel, or you are trying to use data to create or make something.” says D.J.Patil
Smart Data = Big Data + context + inference + declaratively interactive visualization

- Smart Data is (inference) model driven & declaratively interactive
- For example,
- The information like Wikipedia is big data; the in-memory representation Watson referred to is smart data
- Device logs from 1000 good mobile handsets and 1000 not-so-good phones is big data; a gam or glm over the log data after running through several stages of MapReduce is smart data, because it could give you an insight as to what factors or combination of factors make a good phone a bad phone
Focus not only on the Vs (ie Volume,Velocity,Variability & variety) but also on the Cs (ie. Connectedness & Context)
The two main Big Data challenges in 2013 would be:
1st : Data integration across silos to get the comprehensive view &
2nd : Matching the real-time velocity of business viz. CEP, sense & respond et al.
For example, I have already seen folking looking outside Hadoop for CEP and near-realtime response
“.. 85% of respondents say the issue is not about the volume of data but the ability to analyze and act on data in real time” says Ryan Hollenbeck quoting a 2012 Cap Gemini study (Italics mine)
Like this:
Like Loading...