An interesting confluence of events led to this blog …
I was thinking of spec-ing out what an Analytics Cloud would look like …
And came across Freeman Dyson’s review of James Gleick’s book, The Information: A History, a Theory, a Flood (from where I got the following picture)
“According to Gleick, the impact of information on human affairs came in three installments: first the history, the thousands of years during which people created and exchanged information without the concept of measuring it; second the theory; third the flood, in which we now live.”
In his blog The Information Palace, Gleick talks about the origin and the current state of the Information Age – currently “mental exhaustion arising from exposure to too much information”
“The explosive growth of information in our human society is a part of the slower growth of ordered structures in the evolution of life as a whole.” – Information Analytics can shed new lights not only onto our genes but also to heavens ! Only couple of months ago that I had a good chat with Alex and other folks at Data-Scope at JHU on running hadoop on a GPU cluster.
Finally a Gartner report says that “Data Warehousing Reaching Its Most Significant Inflection Point Since Its Inception“. Some good points from the report:
- ” … data warehouse platforms evolve from an information store supporting traditional BI to a broader analytics infrastructure …”
- ” … shifting from storage/access to delivery/comprehension, and that means context as depicted in metadata will become paramount.”
- ” … by 2013, data warehouse vendors will combine their offerings to … an information management platform, … an execution platform … supporting data management, integration & analysis execution …”
It is in this context that I think Analytics Clouds will become a pervasive PaaS …
- … merging the Analytics execution with the elasticity of clouds !
- … adding data models, computational models (like linear regression, clustering, and so forth) as a native part of the platform
- … adding vector structures to big data that includes organization, names attributes, model parameters and results, to metadata for petabytes of data
- … similar to the Kele drums, the Analytics Cloud carries with it (and is capable of communicating) the data and the models that describe the data(the tonal language of data), and at scale !
- … scientists need not throw out data, but can store in the cloud
- … and as models & data are in the cloud, analysis & inference can be done anywhere
- … and as models are already calculated, inferences can be more faster
- … and as the data is available, new models can be derived …
- … as Freeman Dyson writes “an infinite playground, with an unending sequence of mysteries to be understood by an unending sequence of players exploring an unending supply of information...”
- … the possibilities are endless
Now back to the Analytics Cloud spec … A combination of
- NOSQL data stores, massive data – public & private
- public data sets for reference; private data with proper compliance framework
- An analytic PaaS loud – which understands data frames/model metadata,
- With an interface like R,
- [Update 2/23/xi : I just saw this discussion on Big Data & R ! May be there is a synergy between Big Data and nextGen R ! Interaction of R is very good, so Big data behind an R veneer is the best of both worlds]
- Plus processing frameworks like the Hadoop NextGen,
- A business model that spans enterprises, government as well as the educational/research community, probably collaboration with NSF (so that, part of the infrastructure research money can be spent on the analytic cloud and the data to be shared with the scientific community, …
- … in fact, in a pragmatic sense, the analytics are getting mainstream – Algorithms are getting more raunchy, tools – more potent & competitions – more intimate! This is evident from using machine learning for predict crimes in Santa Cruz to the Kaggle competitions for freeway travel time prediction for NSW !
- … so the time has come for the Analytics Cloud
- MIT Special Report – Analytics: The New Path to Value
- [Update Feb 24,xi] An interesting interview of Stonebaker by WSJ
- “I’m a huge fan of purpose-built engines, which is to say one size does not fit all and that if you’re allowed to take advantage of the characteristics of specific vertical markets, you can go a factor of one to two orders of magnitude faster than a general purpose engine.“
- “The sea change that’s happening … in a lot of vertical markets, people are figuring out they want way more complicated analytic codes, things that you would talk about as, say, machine learning … There’s a whole bunch of markets where complex analytics drive what people want to do. And wherever that’s true, then things like Paradigm4 will do very well because they’re focused on going fast on machine-learning kinds of code.”
- I agree … that is what I think Analytics Clouds will become … models & machine learning code along with lot of data … and one can interact with all of them – at scale !
I have a few more ideas and may be a topic for the next blog …