In a set of four lectures spanning about 3 years, Jeff Hawkins explains how & why big data can only be solved by evolutionary-adaptive-continuously-learning models incorporating principles from the working of Neocortex.
It does make sense – especially for NLP, NLU & Knowledge Representation. I am a big fan of the Borgs and their coordinated intelligence.
These are my annotated picture-notes …
Let me begin at the beginning. The other day I came across 4 very interesting talks by Jeff Hawkins on Biological Inspired Machine intelligence.
Call it serendipity because we have been looking for more effective ways for Knowledge Representation (KR) & Natural Language Understanding (NLU)
For example movie names, while very easy for humans to understand, a MaxEnt NER finds it very hard. Knowledge Representation & Association is more harder !
We are experimenting with a few techniques like word-based tries (ie. spell-check sentences by words), higher order federated Bloom Filters and n-gram hashing. Planning to incorporate some of Jeff’s ideas …
I digress … Topics for another day … back to Jeff & Machine Intelligence …
Very inspiring, extremely thought provoking talks – as usual the inimitable Jeff Hawkins at his best
- Google Tech Talk : Jeff Hawkins, “Building Brains to Understand the World’s Data“
- UC Berkeley Graduate Lectures
- “Advances in Modeling Neocortex and its impact on Machine Intelligence” by Jeff Hawkins, Smith Group Lecture presented at the Beckman Institute for Advanced Science & Technology at the University of Illinois at Urbana-Champaign
Le Plat Principal:
The four talks have lot of depth and are packed. Moreover Jeff talks very fast – I listened to the talks a few times – at least 3 hrs per one hour talk. You should listen to them slowly & rewind as reqd. It takes a few hours to get one’s head around the various ideas.
Let me annotate a few of his slides – those I was able to internalize to some extent:
Focus & premise:
The assertion, that many problems can only be solved by incorporating principles from the working on Neocortex, is interesting.
BTW, it does make sense – especially for NLU & Knowledge Representation.
As Jeff mentions later, the behavior need not be human-like, but the representation, interpretation & “understanding” would be.
“Neocortex is just a sheet of cells 2mm thick, the size of a dinner napkin” – Amazing what it can do!
The Six Principal Essentials of Biological Intelligence
The picture says it all.
Learning involves training and adaptive connections
The concept of streaming events & the learning mechanisms
Patterns from complex data streams
The paper “Hierarchical Temporal memory” has the gory details about the Hierarchical Temporal Learning.
Interesting observation: Emotion, the fundamental aspect of being human, is not a requirement for intelligence – reminds us of Spock, of course.
Machine intelligence is not about replicating human behavior or even passing the turing test. I agree on this – we need the machines to think & do things we cannot do thus augmenting us. Make us stronger where we are weak !
What interested me most was the sematic knowledge representation, NLP & NLU. The ability to understand and store concepts, the capacity to generalize as well as the mechanisms of strengthening and weakening connections based on external signals – just beautiful …
Agree that the Sparse Distributed Representation could be the language of all the intelligent machines.
The SDR looks a lot like a giant Bloom Filter
The planes can be considered as rows and a column as the temporal dimension of the semantic mapping (the memory of sequences). Which equates to a giant n-dimensional Bloom Filer – a data structure we can grok (Pun intended as Jeff’s product is called Grok!).
The bloom filter analogy, while extremely simplistic, is conceptually congruent, in the sense that “similar values have similar representation”, of course depending on the hash algorithm.
After listening to the talks and thinking them over, I have a thousand questions in many directions. I will post the answers as we develop this through for our needs. Please send in your insights as comments to this blog. AM sure it will help a few folks !
- How do we handle semantic categories ?
- How do we build more sophisticated representations based on spatial patterns ?
- What is the hash function that maps a slice of semantic to this giant Bloom Filter ?
- How does it handle collision? Corruption ? Clustering for resiliency/self adjusting representation ?
- Collision might be good and I think that is what Jeff calls as semantic generalization
- How does the semantic slice mapping function differentiate between a search & computation to trigger appropriate actions?
- For example the following two questions require different actions:
- “What is stock price of IBM ?” vs.
- “What is the volatility as reflected in the beta of IBM for this quarter ?”
- The first one is a search while the second has computation …
- Is the hash function same for all of us or is it different for each person ?
- Most probably the function is a learned artifact.
- Another interesting vector is the Hierarchy & higher patterns of temporal coalescence/slowness – the high-order capability, tweaking the learning rates across the layers.
- How can this be modeled with the analytical data structures we have?
- And what are the mechanics for stable representation of pattern sequences – because with dynamicity and temporality comes the difficulty of snapshots and consistency between them.
- The unique representation of the same sequence, at a later time in context of the earlier invocation is interesting …
- How do we “put a classifier on the top” ?
- Play with permanence? Probability?
- What are the algorithms to prevent run away prediction?
- I agree that we could account for rapid state difference vs. slower state; we still will have to encapsulate it in some form of code
Finally, can we build “Amazingly Intelligent Machines™?” Yes We can !
And agree with Jeff that “It is essential, for the survival of the spices, that we build them” …