Have been working on this architecture for a couple of years. The idea is to build an AI machine that augments the human capabilities. I know IBM has Watson; Google, FB all have their own versions that address different domains.
The diagram below is more for my understanding and to clarify the thinking. I will write more as I get time. Hope you all find it useful.
It is always interesting to hear from Jeff and understand what he is upto. I have blogged about his earlier talks at XLDB and at Stanford. Jeff Dean’s Keynote at RecSys2014 was no exception. The talk was interesting, the Q&A was stimulating and the links to papers … now we have more work ! – I have a reading list at the end.
Of course, you should watch it (YouTube Link) and go thru his keynote slides at the ACM Conference on Information and Knowledge Managment. Highlights of his talk, from my notes …
- Build a system with simple algorithms and then throw lots of data – let the system build the abstractions. Interesting line of thought;
- I remember hearing about it from Peter Norwig as well ie Google is interested in algorithms that get better with data
- An effective recommendation system requires context ie. understand the user’s surroundings, previous behavior of the user, previous aggregated behavior of many other users and finally textual understanding.
Interesting concept of embedding similar things such that they are nearby in a high dimensional space!
- Jeff then talked about using LSTM (Long Short-Term Memory) Neural Networks for translation.
- Notes from Q & A:
- The async training of the model and random initialization means that different runs will result in different models; but results are within epsilon
- Currently, they are handcrafting the topology of these networks ie now many layers, how many nodes, the connections et al. Evolving the architecture (for example adding a neuron when an interesting feature is discovered) is still a research topic.
- Between ages of 2 & 4, our brain creates 500K neurons / sec and from 5 to 15, starts pruning them !
- The models are opaque and do not have explainability. One way Google is approaching this is by building tools that introspect the models … interesting
- These models work well for classification as well as ranking. (Note : I should try this – may be for a Kaggle competition. 2015 RecSys Challenge !)
- Training CTR system on a nightly basis ?
- Connections & Scale of the models
- Vision : Billions of connections
- Language embeddings : 1000s of millions of connections
- If one has more data, one should have less parameters;otherwise it will overfit
- Rule of thumb : For sparse representations, one parameter per record
- Paragraph vector can capture granular levels while a deep lSTM might be better in capturing the details – TBD
- Debugging is still an art. Check the modelling; factor into smaller problems; see if different data is required
- RBMs and energy based models have not found their way into GOOGL’s production; NNs are finding applications
- Simplification & Complexity : NNs, once you get them working, forms this nice “Algoritmically simple computation mechanisms” in a darkish-brown box ! Less sub systems, less human engineering ! At a different axis of complexity
- Embedding editorial policies is not easy, better to overlay them … [Note : We have an architecture where the pre and post processors annotate the recommendations/results from a DL system]
- There are some interesting papers on both the topics that Jeff mentioned (This my reading list for the next few months! Hope it is useful to you as well !):
- Efficient Estimation of Word Representations in Vector Space [Link]
- Paragraph vector : Distributed Representations of Sentences and Documents [Link]
- [Quoc V.lee ‘s home page]
- Distributed Representations of Words and Phrases and their Compositionality [Link]
- Deep Visual-Semantic Embedding Model [Link]
- Sequence to Sequence Learning with Neural Networks [Link]
- Building high-level features using large scale unsupervised learning [Link]
- word2vec Tool for computing continuous distribution of words [Link]
- Large Scale Distributed Deep Networks [Link]
- Deep Neural Networks for Object Detection [Link]
- Playing Atari with Deep Reinforcement Learning [Link]
- Papers by Google’s Deep Learning Team [Link to Vincent Vanhoucke’s Page]
- And, last but not least, Jeff Dean’s Page
The talk was cut off after ~45 minutes. Am hoping they would publish the rest and the slides. Will add pointers when they are on-line. Drop me a note if you catch them …
Update [10/12/14 21:49] : They have posted the second half ! An watching it now !
Context : I couldn’t attend the RecSys 2014; luckily they have the sessions on YouTube. Plan to watch, take notes & blog the highlights; Recommendation Systems are one of my interest areas.
- Next : Netflix’s CPO Neal Hunt’s Keynote
- Next + 1 : Future Of recommender Systems
- Next + 2 : Interesting Notes from rest of the sessions
- Oh man, I really missed the RecSysTV session. We are working on some addressable recommendations. Already reading the papers. Didn’t see the video for the RecSysTV sessions ;o(
I came across an interesting talk by Google’s Peter Norvig at NASA.
Of course, you should listen to the talk – let me blog about a couple of points that are of interest to me:
Algorithms that get better with Data
Peter had two good points:
- Algorithms behave differently as they churn thru more data. For example in the figure, the Blue algorithm was better with a million training dataset. If one had stopped at that scale, one would be tempted to optimize that algorithm for better performance
- But as the scale increased, the purple algorithm started showing promise – in fact the blue one starts deteriorating at larger scale. The old adage “don’t do premature optimization” is true here as well.
- In general, Google prefers algorithms that get better with data. Not all algorithms are like that, but Google likes to go after the ones with this type of performance characteristic.
There is no serendipity in Google Search or Google Translate
- There is no serendipity in search – it is just rehashing. It is good for finding things, but not at all useful for understanding, interpolation & ultimately inference. I think Intelligent Search is an oxymoron ;o)
- Same with Google Translate. Google Translate takes all it’s cue from the web – it wouldn’t help us communicate with either the non-human inhabitants of this planet or any life form from other planets/milky ways.
- In that sense, I am a little disappointed with Google’s Translation Engines. OTOH, I have only a minuscule view of the work at Google.
The future of human-machine & Augmented Cognition
And, don’t belong to the B-Ark !
Data Science & the profession of a Data Scientist is being debated, rationalized, defined and refactored … I think the domain & the profession is maturing and our understanding of the Mythical Data Scientist is getting more pragmatic. Earlier, I had proposed the idea of a Data Science Engineer last year with similar thoughts; and elaborated more at “Who or what is a Data Scientist?“, “Building a Data Organization that works with Business” & “The sense & sensibility of Data Science devOps“. Things are getting more interesting …
Now to the highlights:
1. Data Scientist is multi-faceted & contextual
- Two points – It requires a multitude of skills & different skill sets at different situations; and definitely is a team effort.
- This tweet sums it all
- Sometimes a Data Scientist has to tell a good business story to make an impact; other times the algorithm wins the day
- Harlan in his blog identifies four combinations – Data Business Person, Data Creative, Data Engineer & Data Researcher
- I don’t fully agree with the diagram – it has lot less programming & little more math; math is usually built-in the ML algorithms and the implementation is embedded in math libraries developed by the optimization specialists. A Data Scientist should n’t be twiddling with the math libraries
- The BAH Field Guide suggests the following mix:
- I would prefer to see more ML than M. ML is the higher from of applied M and also includes Statistics
- Domain Expertise and the ability to identify the correct problems are very important skills of a Data Scientist, says John Forman.
- Or as Rachel Schutt at Columbia quotes:
- Josh Wills (Cloudera)
Data Scientist (noun): Person who is better at statistics than any software engineer & better at software engineering than any statistician
- Will Cukierski (Kaggle) retorts
Data Scientist (noun): Person who is worse at statistics than any statistician & worse at software engineering than any software engineer
2. The Data Scientist team should be building data products
3. To tell the data story effectively, the supporting cast is essential
- As Vishal puts it in his blog,
- Data must be there & processable – the story definitely depends on the data
- Processes & buy-in from management – many times, it is not the inference that is the bottle neck but the business processes that needs to be changed to implement the inferences & insights
- As the BAH Field Guide says it:
4. Pay attention to how the Data Science team is organized
5. Data Science is a continuum of Sophistication & Maturity – a marathon than a spirint
- I am sure organizations understand this intuitively, but many times the understanding is not reflected in their actions.
- Simply Put:
- Descriptive = What Happened
- Diagnostics = Why did it happen ?
- Reactive = Take corrective Actions for what happened
- Proactive = Take actions based on fixed predictions
- Adaptive = Dynamic actions based on learning Predictive Models, embedded business rules and augmented cognition
- Prescriptive = Actionable inferences based on Data Science Models
- In the words of my colleague Marc Isikoff,
Descriptive Analytics yields insight, Diagnostic Analytics yields hypothesis & Predictive Analytics yields assumption ! Wise words – worth their weight in gold coins (or gold bitcoins!)
- Jeff Bertolucci has a quick blog on the Descriptive, Predictive & Prescriptive Analytics.
- Michael Wu, Chief Scientist at Lithium has a series of blogs on this topic
Both Jeff & Michael haven’t talked about the Adaptiveness. For example, recommendation systems (like collaborative filtering) constantly incorporate new data and “tweak” the running models
Let me stop here, I think the blog is getting long already …
- June 28,2014 : Google’s Ray Kurzweil is working on embedding AI into search.
- Good stuff. It is high time, we add intelligence to search.
- June 25, 2014 : Dueling Definitions : Interesting take on the definition, use & context of AI at O’Reilly Radar !
- June 11, 2014
- [May 17,’XIV] Yann/LeCun / NYU/Facebook @reddit – lots of interesting insights on the state of AI
- Most probably I will summarize the discussion in a blog
- [May 17,’XIV] Prof.Andrew Ng moving to Baidu as Chief Scientist
- [Jan 19,2014] An excellent article on Wired about Hinton, who s the undoubted pioneer in Deep Learning – “Meet the Man Google hired to make AI a reality”
- [October 13,2013] Good post by Derrick Harris of GigaOm on work at Stanford on Sentiment Analysis and Deep learning
Back to the main feature …
An interesting blog in GigaOm by Derrick Harris on Deep Learning for the masses. What interested me most was Jeremy Howard from Kaggle.
- “…It’s going to enable whole new classes of products that have never existed before …”
- But there’s a catch: deep learning is really hard. So far, only a handful of teams in hundreds of Kaggle competitions have used it. Most of them have included Geoffrey Hinton or have been associated with him.
- Yep, it is hard. We are trying to bootstrap an application system and haven’t even scratched the surface – so it seems
- If data scientists in places outside Google could simply (a relative term if ever there was one) input their multidimensional data and train models to learn it, that could make other approaches to predictive modeling all but obsolete.
- Yep. Deel Learning is being applied in image recognition, translation et al. It would be interesting to see how the technologies can be applied to retail, banking, manufacturing et al
I also think the broader architecture of the three amigos viz Interface,Inference & Intelligence needs to come together
Smarter Models = Smarter Apps – Yep, definitely !
- The other day I was thinking how to reason about the Analytics & Big Data eco system & came up with a few monikers
- We do have a few interesting architectural artifacts connecting these monikers with the appropriate domains. May be I will share them in a future blog
- For now, back to the essential monikers …
- Syntactically Big Data has three Vs – the Volume, Velocity & Variety.
- A very useful viewpoint that helps us to manage the beast, … but it does nothing for deriving value …
- Semantically the 3 Cs – Context, Connectedness & the Convergence make a lot of sense
- Context is King. Naturally It has many faces:
- Personal Context, Social Context, Enterprise Context, Consumer Context and so forth
- Came across an interesting post on Context being the future – mobility and context would rule in terms of personal apps
- Connectedness is an essential step to mine Smart Data out of Big Data
- Architecturally, I like the Three Amigos : Interface, Intelligence & Inference
- Interface is key – whether it is interface with wearable devices or visualization of data
- Interface also includes Augmented Cognition as well as NLP/NLU
- Intelligence comes from applying Analytics, Machine Learning, Modern AI, Deep Learning et al to Big Data. The intelligence is the algorithms
- Inference, of course is the piece that makes it all worthwhile – Models, Reasoning Engines, Learning Machines, Boltzman Machines all fit in tis category … There is when you overlay the practical/business significance boundary over statistical significance and develop an inetersting service or feature …
In short, it is time we pay attention to the 3Cs & 3 Is of Analytics & Big Data
What says thee ? Am I making any sense ?
Ref: Thanks to http://www.american-buddha.com/cia.threeamigos10.htm for the image