Big Data on the other side of the Trough of Disillusionment

5. Don’t implement a technology infrastructure but the end-to-end pipeline a.k.a. Bytes To Business

SImple Reason : Business doesn’t care about a shiny infrastructure, but about capabilities they can take to market …


4. Think Business Relevance and agility from multiple points of view

Aggregate Even Bigger Datasets, Scenarios and Use Cases

  • Be flexible, tell your stories, leveraging smart data, based on ever changing crisp business use cases & requirements

3. Big Data cuts across enterprise silos – facilitate organization change and adoption

  • Data always has been siloed, with each function having it’s own datasets – transactional as well as data marts
  • Big Data, by definition is heterogeneous & muti-schema
  • Data refresh, source of truth, organizational politics and even fear comes in the picture. Deal with them in a positive way

2. Build Data Products

1. tbd

  • One more for the road …

Jeff Dean : Lessons Learned While Building Infrastructure Software at Google

Last week I attended the XLDB Conference and the invited Workshop at Stanford. I am planning on a series of blogs highlighting the talks. Of course, you should read thru all the XLDB 2013 presentation slides.

Google’s Jeff Dean had an interesting presentation about his experience building GFS, MapReduce, BigTable & Spanner. For those interested in these papers, I have organized them – A Path through NOSQL Reading 

Highlights in pictures (Full slides at XLDB 2013 site):



The Big Data Convergence

As we scan the concepts, technologies, products and the practices in the big data space, lot of things get muddier.

Neither the progression nor the boundaries are clear. We are still in the descriptive stage in terms of the application of the analytics technologies.

I had a good conversation with Bob Friday yesterday – his question was “What prevents us from answering 80% of the questions via automatic inferences ?” And that is the “Adaptive” stage we need to be …

I think a diagram is much better than me writing 100,000 words. So here it is :


In many ways, a lot of the underlying technologies are converging.

For example, A(rtificial) I(ntelligence) = NLP + N(atural) L(anguage) U(nderstanding) + ML + K(nowledge) R(epresentation) + Reasoning
Are Amazing Intelligent Machines in the works ?

Big Data State Of The Union

An informative study by TCS on the current state of Big Data “The Emerging Big Returns on Big Data”



Of course, you should download and read the whole report. Some interesting highlights:

  • There’s a polarity in spending on Big Data, with a minority of companies
    spending massive amounts and a larger number spending very little
  • The business functions expecting the greatest ROI on Big Data are not the ones
    you may think – while Sales & Marketing have initiatives, finance & logistics are betting on big data for efficiences & insights
  • The biggest challenges to getting business value from Big Data are as much
    cultural as they are technological
  • Nearly half the data (49%) is unstructured or semi-structured, while 51% is
    structured. The heavy use of unstructured data is remarkable given that
    just a few years ago it was nearly zero in most companies – Enterprises have gone multi-structured !
  • Monitoring how customers use their products to detect product and design
    flaws is seen as a critical application for Big Data

Cheers & Happy Reading …

5 Steps to Pragmatic Data …er… Big Data

It is 2013 & Big Data is big news … Time to revisit my older (Nov’11) blog “Top 10 Steps to A Pragmatic Big Data Pipeline” … Some things have changed but many have remained the same …

5.  Chuck the hype, embrace the concept …

This seems to the first obvious step for organizations. From Ed Dumbill (“Big data” is an imprecise term...) to TechCrunch (“Perhaps it’s about the actual functionality of apps vs. the data“) agree with the concept, but the terms and marketing hypes have hit the proverbial roof. The point is, there are many ponies this pile & there is tremendous business value (so long as one is willing to discount the hype and think Big Data = All Data) …

I really like Mike Gualtieri’s very insightful definition of Big Data as

… the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers

Big Data 01

4. Don’t implement a Technology, implement THE Big Data pipeline

Think of Big Data in multiple dimensions than a point technology & evolve the pipeline focussing on all the aspects of the stages

Data Science 02

The technologies, the skill sets and the tools are evolving, so are the business requirements.

Chris Taylor addresses this very clearly (“Big Data must not be an elephant riding a bicycle“) – viz. One has to address the entire spectrum to get value …

Simply applying distributed storage and processing (like Hadoop) to extremely large data sets is like putting an elephant on a bicycle .. it just doesn’t make business sense — Chris Taylor

3. Think Hybrid – Big Data Apps, Appliances & Infrastructure

I had addressed this one in my earlier blog(“Big Data Borgs, Rise of the Big Data Machines & Revenge of the Fallen Algorithms“)

The morale of the story : Think out-of-the box & inside-the-box.

Match the impedence of the use cases with appropriate technologies

2. Tell your stories, leveraging smart data, based on crisp business use cases & requirements

Evolve the systems incrementally focussing on the business values that determine the stories to tell, the inferences to derive, the feature sets to influence & the recommendations to make

Augment, not replace the current BI systems

Notice the comma (I am NOT saying “Augment not, Replace”!)

“Replace Teradata with Hadoop” is not a valid use case, given the current state of the technologies. In fact, integration with BI is an interesting challenge for Big Data …

No doubt Hadoop & NOSQL can add a lot of value, but make the case for co-existence leveraging currently installed technologies & skill set. Products like Hive also minimizes barrier to entry for folks who are familiar with SQL

From a business perspective Patrick Keddy of Iron Mountain has a few excellent suggestions on managing Big Data: 

Big data informs and enhances judgement and intuition, it should not replace them

Opt for progress over perfection

View the data in context

1. Apply the art of Data Science & Smart Data, paying attention to touch points

This still remains my #1. Data Science is the key differentiator resulting in new insights, new products, order of magnitude performance, new customer base et al – “a cohesive narrative from the numbers & statistics”

Data science is about trying to create a process that allows you to create new ways of thinking about problems that are novel, or you are trying to use data to create or make something.” says D.J.Patil

Smart Data = Big Data + context + inference + declaratively interactive visualization


  • Smart Data is (inference) model driven & declaratively interactive
  • For example,
    • The information like Wikipedia is big data; the in-memory representation Watson referred to is smart data
    • Device logs from 1000 good mobile handsets and 1000 not-so-good phones is big data;  a gam or glm over the log data after running through several stages of MapReduce is smart data, because it could give you an insight as to what factors or combination of factors make a good phone a bad phone

Focus not only on the Vs (ie Volume,Velocity,Variability & variety) but also on the Cs (ie. Connectedness & Context)

The two main Big Data challenges in 2013 would be:

1st : Data integration across silos to get the comprehensive view &

2nd : Matching the real-time velocity of business viz. CEP, sense & respond et al.

 For example, I have already seen folking looking outside Hadoop for CEP and near-realtime response

“.. 85% of respondents say the issue is not about the volume of data but the ability to analyze and act on data in real timesays Ryan Hollenbeck quoting a 2012 Cap Gemini study (Italics mine)

Big Data Borgs, Rise of the Big Data Machines & Revenge of the Fallen Algorithms

I have been following the 2013 predictions for Big Data. Naturally lots of interesting predictions. Here are a few that I understand and (sort of) agree :