When is ‘big data’ really ‘BIG DATA’ ?


See also my more recent blog ‘Top 10 Steps to a Pragmatic Big Data Pipeline’, if you haven’t seen it. If you are coming from there no need for a recursion ;o)

[update 11/25/11 : Copy of my guest lecture for Ph.D students at the Naval Post Graduate School The Art Of Big Data is at Slideshare]

There seems to be a small difference of opinion on what ‘BIG DATA’ really is.

  • Curt Monash in his blog puts forward an argument that Big Data is really a bit bucket of data from a multitude of sources, nothing more nothing less.
  • Brian Hopkins is of the theory that Big Data = Volume + Variety + Variability + Velocity
  • Methinks:
    • I think ‘Big Data’ is a function of connectedness & context.
    • I read somewhere (I don’t want to misquote anybody, so let us keep it anonymous) that Hadoop excels in applying simple algorithms on large amount of data
  • Makes sense – once one starts applying sophisticated algorithms to a mass of data, one would need chaining of MapReduce tasks and many times the algorithms do not even fit into a MapReduce paradigm. MapReduce NextGen addresses some of those, but not all
  • There is also the Big Data vs. Smart Data where the data carries it’s models and semantics.
    • Rather than the Vs, it is the Cs – Context-uality & Connected-ness that make big data ‘BIG DATA’. I call it ‘Smart Data’; monikers aside, most probably it is the ability to apply different models, the capability to infer/predict and visualize that makes it a little different than the traditional usage. May be it is not the organization at all, but how we use it that differentiates ‘big data’ from ‘BIG DATA’.
    • Curt also has a problem with ‘Big Data Analytics’. I kind of agree, in the sense that it Big_Data_Analytics is a processing pipeline that spans the collection, transformation, storage, analysis, inference/prediction and most importantly visualization/infographics.
    • The Big Data Analytics, without calling it that is not a point function
    • One thing I agree with an article in SOA world is that there is also the dimension of semi-structured data
    • Which BTW is not the same as Curt’s multi-vs-poly structure. I don’t think Curt got that right – while changing structure is a big thing (that is one of the reasons why NOSQL came into existence) it by itself doesn’t make data any bigger !
    • And Gartner’s ‘Extreme Data’ moniker is no better than ‘Big Data’ – it is still vague (or very general) … ‘Smart Data’ might be better …
    • As Merv Adrian says, Where is Mr.Dundee when we need him !


    Update [10/23/11] Came across this link on Defining Big Data

    Update [10/24/11] “Last week there were several events that convinced me that one of the great tech bubbles inflating right now is around what people have agreed to call “Big Data.” Ouch ! NYTimes Bits

    Advertisements

    5 thoughts on “When is ‘big data’ really ‘BIG DATA’ ?

    1. Pingback: Big Data & NOSQL Nirvana : HPTS Day 1 « My missives

    2. Pingback: Top 10 Steps to a Pragmatic Big Data Pipeline « My missives

    3. Just to clarify the genesis of the 3Vs concept for big data. While others, not just Forrester, have laid claim to it, the framework was first defined in a Gartner research note I wrote 10 years ago. Happy to share a copy with anyone. -Doug Laney, VP Research, Gartner

    4. Pingback: Facebook Infrastructure @ New Years Eve – A study in Scalability « My missives

    5. Pingback: Quora

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s