5 Steps to Pragmatic Data …er… Big Data

It is 2013 & Big Data is big news … Time to revisit my older (Nov’11) blog “Top 10 Steps to A Pragmatic Big Data Pipeline” … Some things have changed but many have remained the same …

5.  Chuck the hype, embrace the concept …

This seems to the first obvious step for organizations. From Ed Dumbill (“Big data” is an imprecise term...) to TechCrunch (“Perhaps it’s about the actual functionality of apps vs. the data“) agree with the concept, but the terms and marketing hypes have hit the proverbial roof. The point is, there are many ponies this pile & there is tremendous business value (so long as one is willing to discount the hype and think Big Data = All Data) …

I really like Mike Gualtieri’s very insightful definition of Big Data as

… the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers

Big Data 01

4. Don’t implement a Technology, implement THE Big Data pipeline

Think of Big Data in multiple dimensions than a point technology & evolve the pipeline focussing on all the aspects of the stages

Data Science 02

The technologies, the skill sets and the tools are evolving, so are the business requirements.

Chris Taylor addresses this very clearly (“Big Data must not be an elephant riding a bicycle“) – viz. One has to address the entire spectrum to get value …

Simply applying distributed storage and processing (like Hadoop) to extremely large data sets is like putting an elephant on a bicycle .. it just doesn’t make business sense — Chris Taylor

3. Think Hybrid – Big Data Apps, Appliances & Infrastructure

I had addressed this one in my earlier blog(“Big Data Borgs, Rise of the Big Data Machines & Revenge of the Fallen Algorithms“)

The morale of the story : Think out-of-the box & inside-the-box.

Match the impedence of the use cases with appropriate technologies

2. Tell your stories, leveraging smart data, based on crisp business use cases & requirements

Evolve the systems incrementally focussing on the business values that determine the stories to tell, the inferences to derive, the feature sets to influence & the recommendations to make

Augment, not replace the current BI systems

Notice the comma (I am NOT saying “Augment not, Replace”!)

“Replace Teradata with Hadoop” is not a valid use case, given the current state of the technologies. In fact, integration with BI is an interesting challenge for Big Data …

No doubt Hadoop & NOSQL can add a lot of value, but make the case for co-existence leveraging currently installed technologies & skill set. Products like Hive also minimizes barrier to entry for folks who are familiar with SQL

From a business perspective Patrick Keddy of Iron Mountain has a few excellent suggestions on managing Big Data: 

Big data informs and enhances judgement and intuition, it should not replace them

Opt for progress over perfection

View the data in context

1. Apply the art of Data Science & Smart Data, paying attention to touch points

This still remains my #1. Data Science is the key differentiator resulting in new insights, new products, order of magnitude performance, new customer base et al – “a cohesive narrative from the numbers & statistics”

Data science is about trying to create a process that allows you to create new ways of thinking about problems that are novel, or you are trying to use data to create or make something.” says D.J.Patil

Smart Data = Big Data + context + inference + declaratively interactive visualization


  • Smart Data is (inference) model driven & declaratively interactive
  • For example,
    • The information like Wikipedia is big data; the in-memory representation Watson referred to is smart data
    • Device logs from 1000 good mobile handsets and 1000 not-so-good phones is big data;  a gam or glm over the log data after running through several stages of MapReduce is smart data, because it could give you an insight as to what factors or combination of factors make a good phone a bad phone

Focus not only on the Vs (ie Volume,Velocity,Variability & variety) but also on the Cs (ie. Connectedness & Context)

The two main Big Data challenges in 2013 would be:

1st : Data integration across silos to get the comprehensive view &

2nd : Matching the real-time velocity of business viz. CEP, sense & respond et al.

 For example, I have already seen folking looking outside Hadoop for CEP and near-realtime response

“.. 85% of respondents say the issue is not about the volume of data but the ability to analyze and act on data in real timesays Ryan Hollenbeck quoting a 2012 Cap Gemini study (Italics mine)

About these ads

4 thoughts on “5 Steps to Pragmatic Data …er… Big Data

  1. Pingback: Big Data Borgs, Rise of the Big Data Machines & Revenge of the Fallen Algorithms | My missives

  2. Pingback: All the President’s Data Scientists | My missives

  3. Pingback: Big Data on the other side of the Trough of Disillusionment | My missives

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s