Is our Neocortex a Giant Semantic Bloom Filter ? Of Natural Intelligence, Machine Learning & Jeff Hawkins


L’Apéritif:

Image

In a set of four lectures spanning about 3 years, Jeff Hawkins explains how & why big data can only be solved by evolutionary-adaptive-continuously-learning models incorporating principles from the working of Neocortex.
It does make sense – especially for NLP, NLU & Knowledge Representation. I am a big fan of the Borgs and their coordinated intelligence.

These are my annotated picture-notes …

L’Entrée:

Let me begin at the beginning. The other day I came across 4 very interesting talks by Jeff Hawkins on Biological Inspired Machine intelligence.

Call it serendipity because we have been looking for more effective ways for Knowledge Representation (KR) & Natural Language Understanding (NLU)

For example movie names, while very easy for humans to understand, a MaxEnt NER finds it very hard.  Knowledge Representation & Association is more harder !

We are experimenting with a few techniques like word-based tries (ie. spell-check sentences by words), higher order federated Bloom Filters and n-gram hashing. Planning to incorporate some of Jeff’s ideas …

I digress … Topics for another day … back to Jeff & Machine Intelligence …

Very inspiring, extremely thought provoking talks – as usual the inimitable Jeff Hawkins at his best

  1. Google Tech Talk : Jeff Hawkins, “Building Brains to Understand the World’s Data
  2. UC Berkeley Graduate Lectures
  3. “Advances in Modeling Neocortex and its impact on Machine Intelligence” by Jeff Hawkins,  Smith Group Lecture presented at the Beckman Institute for Advanced Science & Technology at the University of Illinois at Urbana-Champaign

Le Plat Principal:

The four talks have lot of depth and are packed. Moreover Jeff talks very fast – I listened to the talks a few times – at least 3 hrs per one hour talk. You should listen to them slowly & rewind as reqd. It takes a few hours to get one’s head around the various ideas.

Let me annotate a few of his slides – those I was able to internalize to some extent:

Focus & premise[3]:

Hawkins-100-02-01

The assertion, that many problems can only be solved by incorporating principles from the working on Neocortex, is interesting.

BTW, it does make sense – especially for NLU & Knowledge Representation.

As Jeff mentions later, the behavior need not be human-like, but the representation, interpretation & “understanding” would be.

Neocortex Architecture[3]:

“Neocortex is just a sheet of cells  2mm thick, the size of a dinner napkin” – Amazing what it can do!

Hawkins-100-03-01

The Six Principal Essentials of Biological Intelligence

The picture says it all.

Hawkins-100-04-01

Learning involves training and adaptive connections

Hawkins-100-05-01

The concept of streaming events & the learning mechanisms

Patterns from complex data streams

Hawkins-100-06-01

The paper “Hierarchical Temporal memory” has the gory details about the Hierarchical Temporal Learning.

Future

Hawkins-100-09-01

Interesting observation: Emotion, the fundamental aspect of being human, is not a requirement for intelligence – reminds us of Spock, of course.

Machine intelligence is not about replicating human behavior or even passing the turing test. I agree on this – we need the machines to think & do things we cannot do thus augmenting us. Make us stronger where we are weak !

Le Digestif

What interested me most was the sematic knowledge representation, NLP & NLU. The ability to understand and store concepts, the capacity to generalize as well as the mechanisms of strengthening and weakening connections based on external signals – just beautiful …

Agree that the Sparse Distributed Representation could be the language of all the intelligent machines.

The SDR looks a lot like a giant Bloom Filter

Hawkins-100-10-01

Hawkins-100-11-01The planes can be considered as rows and a column as the temporal dimension of the semantic mapping (the memory of sequences). Which equates to a giant n-dimensional Bloom Filer – a data structure we can grok (Pun intended as Jeff’s product is called Grok!).

The bloom filter analogy, while extremely simplistic, is conceptually congruent, in the sense that “similar values have similar representation”, of course depending on the hash algorithm.

After listening to the talks and thinking them over, I have a thousand questions in many directions. I will post the answers as we develop this through for our needs. Please send in your insights as comments to this blog. AM sure it will help a few folks !

Hawkins-100-12-01

  1. How do we handle semantic categories ? 
  2. How do we build more sophisticated representations based on spatial patterns ?
  3. What is the hash function that maps a slice of semantic to this giant Bloom Filter ?
  4. How does it handle collision? Corruption ? Clustering for resiliency/self adjusting representation ?
    • Collision might be good and I think that is what Jeff calls as semantic generalization
  5. How does the semantic slice mapping function differentiate between a search & computation to trigger appropriate actions?
    • For example the following two questions require different actions: 
      • What is stock price of IBM ?” vs.
      • What is the volatility as reflected in the beta of IBM for this quarter ?” 
      • The first one is a search while the second has computation …
  6. Is the hash function same for all of us or is it different for each person ?
    • Most probably the function is a learned artifact.
  7. Another interesting vector is the Hierarchy & higher patterns of temporal coalescence/slowness – the high-order capability, tweaking the learning rates across the layers.
    • How can this be modeled with the analytical data structures we have?
    • And what are the mechanics for stable representation of pattern sequences – because with dynamicity and temporality comes the difficulty of snapshots and consistency between them.
    • The unique representation of the same sequence, at a later time in context of the earlier invocation is interesting …
  8. How do we “put a classifier on the top” ?
    • Play with permanence? Probability?
  9. What are the algorithms to prevent run away prediction?
    • I agree that we could account for rapid state difference vs. slower state; we still will have to encapsulate it in some form of code

Finally, can we build “Amazingly Intelligent Machines?” Yes We can !

And agree with Jeff that “It is essential, for the survival of the spices, that we build them” …

The Big Data Convergence


As we scan the concepts, technologies, products and the practices in the big data space, lot of things get muddier.

Neither the progression nor the boundaries are clear. We are still in the descriptive stage in terms of the application of the analytics technologies.

I had a good conversation with Bob Friday yesterday – his question was “What prevents us from answering 80% of the questions via automatic inferences ?” And that is the “Adaptive” stage we need to be …

I think a diagram is much better than me writing 100,000 words. So here it is :

Image

In many ways, a lot of the underlying technologies are converging.

For example, A(rtificial) I(ntelligence) = NLP + N(atural) L(anguage) U(nderstanding) + ML + K(nowledge) R(epresentation) + Reasoning
Are Amazing Intelligent Machines in the works ?

Big Data State Of The Union


An informative study by TCS on the current state of Big Data “The Emerging Big Returns on Big Data”

.

Image

Of course, you should download and read the whole report. Some interesting highlights:

  • There’s a polarity in spending on Big Data, with a minority of companies
    spending massive amounts and a larger number spending very little
  • The business functions expecting the greatest ROI on Big Data are not the ones
    you may think – while Sales & Marketing have initiatives, finance & logistics are betting on big data for efficiences & insights
  • The biggest challenges to getting business value from Big Data are as much
    cultural as they are technological
  • Nearly half the data (49%) is unstructured or semi-structured, while 51% is
    structured. The heavy use of unstructured data is remarkable given that
    just a few years ago it was nearly zero in most companies – Enterprises have gone multi-structured !
  • Monitoring how customers use their products to detect product and design
    flaws is seen as a critical application for Big Data

Cheers & Happy Reading …

5 Steps to Pragmatic Data …er… Big Data


It is 2013 & Big Data is big news … Time to revisit my older (Nov’11) blog “Top 10 Steps to A Pragmatic Big Data Pipeline” … Some things have changed but many have remained the same …

5.  Chuck the hype, embrace the concept …

This seems to the first obvious step for organizations. From Ed Dumbill (“Big data” is an imprecise term...) to TechCrunch (“Perhaps it’s about the actual functionality of apps vs. the data“) agree with the concept, but the terms and marketing hypes have hit the proverbial roof. The point is, there are many ponies this pile & there is tremendous business value (so long as one is willing to discount the hype and think Big Data = All Data) …

I really like Mike Gualtieri’s very insightful definition of Big Data as

… the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers

Big Data 01

4. Don’t implement a Technology, implement THE Big Data pipeline

Think of Big Data in multiple dimensions than a point technology & evolve the pipeline focussing on all the aspects of the stages

Data Science 02

The technologies, the skill sets and the tools are evolving, so are the business requirements.

Chris Taylor addresses this very clearly (“Big Data must not be an elephant riding a bicycle“) – viz. One has to address the entire spectrum to get value …

Simply applying distributed storage and processing (like Hadoop) to extremely large data sets is like putting an elephant on a bicycle .. it just doesn’t make business sense — Chris Taylor

3. Think Hybrid – Big Data Apps, Appliances & Infrastructure

I had addressed this one in my earlier blog(“Big Data Borgs, Rise of the Big Data Machines & Revenge of the Fallen Algorithms“)

The morale of the story : Think out-of-the box & inside-the-box.

Match the impedence of the use cases with appropriate technologies

2. Tell your stories, leveraging smart data, based on crisp business use cases & requirements

Evolve the systems incrementally focussing on the business values that determine the stories to tell, the inferences to derive, the feature sets to influence & the recommendations to make

Augment, not replace the current BI systems

Notice the comma (I am NOT saying “Augment not, Replace”!)

“Replace Teradata with Hadoop” is not a valid use case, given the current state of the technologies. In fact, integration with BI is an interesting challenge for Big Data …

No doubt Hadoop & NOSQL can add a lot of value, but make the case for co-existence leveraging currently installed technologies & skill set. Products like Hive also minimizes barrier to entry for folks who are familiar with SQL

From a business perspective Patrick Keddy of Iron Mountain has a few excellent suggestions on managing Big Data: 

Big data informs and enhances judgement and intuition, it should not replace them

Opt for progress over perfection

View the data in context

1. Apply the art of Data Science & Smart Data, paying attention to touch points

This still remains my #1. Data Science is the key differentiator resulting in new insights, new products, order of magnitude performance, new customer base et al – “a cohesive narrative from the numbers & statistics”

Data science is about trying to create a process that allows you to create new ways of thinking about problems that are novel, or you are trying to use data to create or make something.” says D.J.Patil

Smart Data = Big Data + context + inference + declaratively interactive visualization

smartData02

  • Smart Data is (inference) model driven & declaratively interactive
  • For example,
    • The information like Wikipedia is big data; the in-memory representation Watson referred to is smart data
    • Device logs from 1000 good mobile handsets and 1000 not-so-good phones is big data;  a gam or glm over the log data after running through several stages of MapReduce is smart data, because it could give you an insight as to what factors or combination of factors make a good phone a bad phone

Focus not only on the Vs (ie Volume,Velocity,Variability & variety) but also on the Cs (ie. Connectedness & Context)

The two main Big Data challenges in 2013 would be:

1st : Data integration across silos to get the comprehensive view &

2nd : Matching the real-time velocity of business viz. CEP, sense & respond et al.

 For example, I have already seen folking looking outside Hadoop for CEP and near-realtime response

“.. 85% of respondents say the issue is not about the volume of data but the ability to analyze and act on data in real timesays Ryan Hollenbeck quoting a 2012 Cap Gemini study (Italics mine)

Big Data Borgs, Rise of the Big Data Machines & Revenge of the Fallen Algorithms


I have been following the 2013 predictions for Big Data. Naturally lots of interesting predictions. Here are a few that I understand and (sort of) agree :

What or Who is a Data Scientist ?


DataScientist = Part Hacker + Part Technologist + Part Detective + Part Scientist + Part Business Analyst + Part  Visual Artist

[Update 12/22/12] GigaOm says : Data Science = Data Architecture + Machine Learning + Analytics. Makes sense. I have updated my diagram accrdingly
Data Science 02
DataScienceTeam

All the President’s DevOps


In the heels of “All the President’s Data Scientists” another interesting article on the Obama campaign’s cloud infrastructure.

Update : A similar article The Atlantic’s “When the Nerds Go Marching In”

Update : Case Study from New Relic How the Obama For America team improved resilience

Image

  • They realized the campaign needed a scalable system “2008 was the ‘Jaws’ moment,” said Obama for America’s Chief Technology Officer Harper Reed. “It was, ‘Oh my God, we’re going to need a bigger boat.”
  • They build a single shared data tier with APIs to build lots of interesting applications. “Being able to decouple all the apps from each other has such power; It allowed us to scale each app individually and to share a lot of data between the apps, and it really saved us a lot of time.”
  • They leveraged internet architecture ”We aggressively stood on the shoulders of giants like Amazon, and used technology that was built by other people,”
  • Doesn’t look like they used esoteric technologies. The system is built around Python APIs over RDS, SQS and so forth. Excellent and the fact that the systems can built this way is a testament to the cloud capabilities – IaaS & PaaS
  • In short Reed says it all “”When you break it down to programming, we didn’t build a data store or a faster queue. All we did was put these pieces together and arrange them in the right order to give the field organization the tools they needed to do their job. And it worked out. It didn’t hurt that we had a really great candidate and the best ground game that the world has ever seen.”

All the President’s Data Scientists


The Times Election Commemorative Edition has an interesting article on the role of Data Science “Inside the Secret World of the Data Crunchers Who Helped Obama Win“. A few quick lessons (Of course, you should read the full Times article):

[Update 2/14/13] Infoworld has an interesting take on Big Data Analytics and the Obama Campaign. In addition to the Time’s narration of 4 lessons, InfoWorld adds the following:

  • Combined efforts of Analysts & Engineers
  • Implemented in weeks than months
  • Built around unconstrained, yet centralized environment (This is important for big data)
    • This enabled the analysts to ask questions irrespective of wherever the data originated from
  • Continuous inprovement, with built-in feedback loop

Note : I discuss the 5 Pragmatic Steps for Data …. er… Big Data in another blog
[update 2/28/13] AWS case study “Obama For America” has interesting details

My blog “All the President’s DevOps” on the infrastructure side of this system

1. Elevate Data Science to a 1st class Citizen

  • Campaign manager Jim Messina had promised a totally different, metric-driven kind of campaign in which politics was the goal but political instincts might not be the means. “We are going to measure every single thing in this campaign” … And hired a team of Data Scientists headed by Rayid Ghani
  • Rayid had visited Stanford to recruit budding Data Scientists – I wanted to attend, but couldn’t; am sure they would have also visited other campuses
  • Exactly what that team of dozens of data crunchers was doing, however, was a closely held secret. “They are our nuclear codes,” as the campaign guarded what it believed to be its biggest institutional advantage over Mitt Romney’s campaign: its data.

2. Collect, Unify & Leverage Big Data

  • As I had written in one of my earlier blogs, the spectacular results of Data Science (inference and predictions) come from an effective data pipeline.
  • The Obama campaign has interesting pipelines of big data streams
  • While 2008 campaign was very successful, the team realized that they had too many databases & “None of them talked to each other.
  • So over the first 18 months, the campaign merged the information collected from pollsters, fundraisers, field workers and consumer databases as well as social-media and mobile contacts with the main Democratic voter files in the swing states. — Brings tears to the eyes of a data architect!
  • They actually built an awesome data mining infrastructure
  • This “megafile was the foundation for simulation runs for contributions, “persuadability” analysis and so forth

3. Practice Metric-driven Data Science

  • Don’t be afraid to create bold models, but back them up with reality
  • The Data Scientists have developed interesting models & predictions, but tested them with e-mails with different subjects, monitor results from e-mail and phone campaigns et al.
  • … assumptions were rarely left in place without numbers to back them up

4. Effective Modeling comes from weaving Big Data & Live Data

  • “The analytics team used four streams of polling data to build a detailed picture of voters in key states”
  • The polling and voter-contact data were processed and reprocessed nightly to account for every imaginable scenario.
  • We ran the election 66,000 times every night

And finally the article ends with an insightful statement,

In politics, the era of #BigData has arrived !

I rest my case with one more observation on Obama’s Digital Gurus

Inside The Cave report is a good read

Cheers

<k/>

Reference: