Google’s Jeff Dean on Scalable Predictive DeepLearning – A Kitbizer’s notes from Recsys 2014 (Note :


It is always interesting to hear from Jeff and understand what he is upto. I have blogged about his earlier talks at XLDB and at Stanford. Jeff Dean’s Keynote at RecSys2014 was no exception. The talk was interesting, the Q&A was stimulating and the links to papers … now we have more work ! – I have a reading list at the end.

Of course, you should watch it (YouTube Link) and go thru his keynote slides at the ACM Conference on Information and Knowledge Managment. Highlights of his talk, from my notes …

dean-recsys-01

  • Build a system with simple algorithms and then throw lots of data – let the system build the abstractions. Interesting line of thought;
  • I remember hearing about it from Peter Norwig as well ie Google is interested in algorithms that get better with data
  • An effective recommendation system requires context ie. understand the user’s surroundings, previous behavior of the user, previous aggregated behavior of many other users and finally textual understanding.

dean-recsys-02-01


  • He then elaborated one of the area they are working on — semantic embeddings, paragraph vector and similar mechanisms

dean-recsys-03

Interesting concept of embedding similar things such that they are nearby in a high dimensional space!

  • Jeff then talked about using LSTM (Long Short-Term Memory) Neural Networks for translation.

pic09-01

  • Notes from Q & A:
    • The async training of the model and random initialization means that different runs will result in different models; but results are within epsilon
    • Currently, they are handcrafting the topology of these networks ie now many layers, how many nodes, the connections et al. Evolving the architecture (for example adding a neuron when an interesting feature is discovered) is still a research topic.
      • Between ages of 2 & 4, our brain creates 500K neurons / sec and from 5 to 15, starts pruning them !
    • The models are opaque and do not have explainability. One way Google is approaching this is by building tools that introspect the models … interesting
    • These models work well for classification as well as ranking. (Note : I should try this – may be for a Kaggle competition. 2015 RecSys Challenge !)
    • Training CTR system on a nightly basis ?
    • Connections & Scale of the models
      • Vision : Billions of connections
      • Language embeddings : 1000s of millions of connections
      • If one has more data, one should have less parameters;otherwise it will overfit
      • Rule of thumb : For sparse representations, one parameter per record
    • Paragraph vector can capture granular levels while a deep lSTM might be better in capturing the details – TBD
    • Debugging is still an art. Check the modelling; factor into smaller problems; see if different data is required
    • RBMs and energy based models have not found their way into GOOGL’s production; NNs are finding applications
    • Simplification & Complexity : NNs, once you get them working, forms this nice “Algoritmically simple computation mechanisms” in a darkish-brown box ! Less sub systems, less human engineering ! At a different axis of complexity
    • Embedding editorial policies is not easy, better to overlay them … [Note : We have an architecture where the pre and post processors annotate the recommendations/results from a DL system]
  • There are some interesting papers on both the topics that Jeff mentioned (This my reading list for the next few months! Hope it is useful to you as well !):
    1. Efficient Estimation of Word Representations in Vector Space [Link]
    2. Paragraph vector : Distributed Representations of Sentences and Documents [Link]
    3. [Quoc V.lee ‘s home page]
    4. Distributed Representations of Words and Phrases and their Compositionality [Link]
    5. Deep Visual-Semantic Embedding Model [Link]
    6. Sequence to Sequence Learning with Neural Networks [Link]
    7. Building high-level features using large scale unsupervised learning [Link]
    8. word2vec Tool for computing continuous distribution of words [Link]
    9. Large Scale Distributed Deep Networks [Link]
    10. Deep Neural Networks for Object Detection [Link]
    11. Playing Atari with Deep Reinforcement Learning [Link]
    12. Papers by Google’s Deep Learning Team [Link to Vincent Vanhoucke’s Page]
    13. And, last but not least, Jeff Dean’s Page

The talk was cut off after ~45 minutes. Am hoping they would publish the rest and the slides. Will add pointers when they are on-line. Drop me a note if you catch them …

Update [10/12/14 21:49] : They have posted the second half ! An watching it now !

 Context : I couldn’t attend the RecSys 2014; luckily they have the sessions on YouTube. Plan to watch, take notes & blog the highlights; Recommendation Systems are one of my interest areas.

  • Next : Netflix’s CPO Neal Hunt’s Keynote
  • Next + 1 : Future Of recommender Systems
  • Next + 2 : Interesting Notes from rest of the sessions
  • Oh man, I really missed the RecSysTV session. We are working on some addressable recommendations. Already reading the papers. Didn’t see the video for the RecSysTV sessions ;o(

A Glimpse of Google, NASA & Peter Norvig + The Restaurant at the End of the Universe


I came across an interesting talk by Google’s Peter Norvig at NASA.

Of course, you should listen to the talk – let me blog about a couple of points that are of interest to me:

Algorithms that get better with Data

Peter had two good points:

Norvig-01

  • Algorithms behave differently as they churn thru more data. For example in the figure, the Blue algorithm was better with a million training dataset. If one had stopped at that scale, one would be tempted to optimize that algorithm for better performance
  • But as the scale increased, the purple algorithm started showing promise – in fact the blue one starts deteriorating at larger scale. The old adage “don’t do premature optimization” is true here as well. 
  • Norvig-02In general, Google prefers algorithms that get better with data. Not all algorithms are like that, but Google likes to go after the ones with this type of performance characteristic. 

There is no serendipity in Google Search or Google Translate

  • There is no serendipity in search – it is just rehashing. It is good for finding things, but not at all useful for understanding, interpolation & ultimately inference. I think Intelligent Search is an oxymoron ;o)
  • Same with Google Translate. Google Translate takes all it’s cue from the web – it wouldn’t help us communicate with either the non-human inhabitants of this planet or any life form from other planets/milky ways.
    • In that sense, I am a little disappointed with Google’s Translation Engines.  OTOH, I have only a minuscule view of the work at Google.

The future of human-machine & Augmented Cognition

And, don’t belong to the B-Ark !

Jeff Dean : Lessons Learned While Building Infrastructure Software at Google


Image
Last week I attended the XLDB Conference and the invited Workshop at Stanford. I am planning on a series of blogs highlighting the talks. Of course, you should read thru all the XLDB 2013 presentation slides.

Google’s Jeff Dean had an interesting presentation about his experience building GFS, MapReduce, BigTable & Spanner. For those interested in these papers, I have organized them – A Path through NOSQL Reading 

Highlights in pictures (Full slides at XLDB 2013 site):

xldb-02

xldb-03

Deep Learning – The Next Frontier ?


Updates:

  1. [12/29/13] Forbes : A general article on the significance of Machine Learning, AI et al
  2. [12/28/13] NY Times Brain-Like Computers, Learning from Experience
  3. [8/9/13] Kind of jumbled blog in Forbes – What is DeepLearning & Why should businesses care
  4. DeepLearning for the masses

Came across an informative talk by Jeff Dean on Deep Learning at Google.

Image

Leaving cat herding aside, I strongly believe that Deep Learning & AI can be effectively used in the context of Big Data to address interesting problems in the Enterprises, especially Retail & Banking.

Trying to build a few neurons to run some interesting big data use cases …

Hence my interest in this domain

Let me share some of the interesting points from Jeff’s talk, annotating the slides. Of course, you should listen to his talk – it is short & succinct …

But plan to spend some time to think through as well as research into related papers … The DistBelief paper has the potential to be as influential as the Jeff’s MapReduce paper.

Image

DL-Dean-12

DL-Dean-05

DL-Dean-07

DL-Dean-10

DL-Dean-11

Related Links:

  1. Jeremy Howard of Kaggle tags Deep Learning as The Biggest Data Science Breakthrough of the Decade ! What says thee ?
  2. Excellent overview from IEEE Computational Intelligence – “Deep Machine Learning—A New Frontier in Artificial Intelligence Research”
  3. AI Breakthrough – From MIT Technology Review
  4. New York Times has a good article on Deep Learning “Scientists See Promise in Deep-Learning Programs”
  5. Huffington Post talks about Big Data & Deep Learning

Book Review – In the Plex : How Google Thinks, works and shapes our lives


Prelude:

I liked the book a lot, it reads like a thriller- at least to me. I couldn’t put it down and was reading the book late night, during work days – to the chagrin of the family !

Stephen Levy has clearly chronicled Google’s ascend and the tribulations it encountered – internal and external, on the way. What is more interesting is the fact that he has written a set of very crisp & detailed explanation of the innovations that Google brought into the search & advertisement domains.

I agree with Stephen that Google is a “clever internet-startup-named-after-a-100-digit-number turned into a corporate phenomenon”. It is very interesting to read it’s agony to IPO (and the ecstasy of the investors!) If Google had it’s way it would have added a requirement of min SAT score (and a Stanford PhD – at least an MMDS Certificate) for buying it’s shares ! Am forced to quote Scott Reeves (Forbes Aug 2004) on Google’s targeted price of $108/$135 “Only those who were dropped on their head at birth [will] plunk down that kind of cash for an IPO” – ouch ! (I myself was ready for around $50)

Google – A sum of it’s Obsessions

Search (Of course!)

  • PageRank, of course, refers to Larry Page’s Ranking Algorithm ! The PageRank estimates the importance of a page by the web pages that link to it. “We convert the entire web into a big equation with several hundred million variables”
  • The concept of signals – viz factors like terms, capitalization, font size, position et al – as traits added with PageRank is the secret sauce that made Google’s search very effective.
  • The search engines get major and minor rewrites “like changing the components of a flying plane – without the passengers knowing about it, but the ride becomes more comfortable and they get there faster “ not a perfect analogy but an effective simile!
  • The engineers fret about any queries that do not get answered in the first page – in many ways clicking next page in a search result is a failure of the brilliant engineers behind the search engine. You have to read about the query “Audrey Fino” that vexed Amit Singhal Google’s chief of search engine. The search showed lots of Audrey Hepburn and that bothered Amit – “There’s a person somewhere names Audrey Fino and we didn’t have the smarts in the system to know this” and the remedy was of course – to state Stephen,  a multi-year name detection and name classifier “algorithmic therapy” with a dash of “bigram breakage” added to taste !
  • Rokc is rock unless it has little in front of it (when it becomes the capital of a state) or if preceded by Noah becomes ark ! Another such query was “Eika Kerzen” which requires translation (to German in this case) to get to the right search result.

Algorithmic purity & ubiquitous

  • Google is an algorithmic company driven by computer science ! We can see that everywhere – successes and failure. For example the number of shares at IPO (2,718,281,828) is the Napier’s constant e ! During the bidding of patents, Google was bidding numbers like pi for Nortel’s patents
  • Even the Google ad sales people consider themselves as mediators between madison avenue and algorithms – only Google can say with both the words in the same sentence, make it sensible and in the process create an industry where it makes billions of dollars – as one SEO chief puts it “It is not we want to put all our boxes in one basket,  but there is only one basket in the industry”
  • The great lengths the Google team would go to make search relevant is exemplified by the “running shoes gnome sculpture”. The engineers believed in algorithmic purity – and before the launch of the Froogle product search, “running shoes” would show a “garden gnome sculpture that happened to wear sneakers”. The team cannot ship a product that fails to differentiate between a lawn art and a footwear. It seems within a couple of days the offending link disappeared ! And the team learned that one of their teammates went ahead and bought the one-of-a-kind sculpture that taking it off the web site !   “The algorithm started showing the right results, … and we launched!”
  • Search algorithmics sometimes had very strange effects – like showing the now defunct main office of bell telephone for a query “weather.com Philadelphia” – reason being the telephone company used to tell weather over the phone and this factoid was unearthed by the search algorithm !
  • It is interesting to read how Google re-invented the bidding system “Vickery second-bid action system” because the engineer (Eric Veach) wanted to avoid the “bid shading”. In the end, like anything else that Google touches, they created an innovative system that combined a few factors like bidding and ad-positioning, adding competition & customer satisfaction, in the end creating a rolling revenue stream in the order of billions of dollars for Google  – all in all a nifty feat!
  • The concept of compressing data to understand it was a brilliant stroke – the Google project called Phil (Probabilistic Hierarchical Inferential Learner) resulted in understanding the essence of web pages and …. Contextual matching ads with the web page’s content service called “Google content-targeted advertising” which later became AdSense (after acquiring the company Applied Semantics!)

Scale

  • Their success of algorithms (gave Eigen vector some credence) and  the change of scale that came with that was what made Google Google ! As Luiz Barrozzo observed “There are programs that do not run on anything smaller than a 1000 machines,  which means you are looking at a datacenter as a computer “
  • Google affects whatever it touches in unpredictable ways – for example, Google’s racks maxed out (power & cooling) at Exodus that Exodus drove an 18-wheeler upto the colo, punched 3 holes in the wall and pumped cold air into Google’s cage through PVC pipes!

The movers

  •  As I was reading the book, there were a few people I knew who played prominent roles in Google – was wondering when Hal Varian would show up – he did in (P.116) and stayed relevant in a lot of pages with his team of “econometricians” cross between statisticians and economists !
  • Was wondering when Sundar Pichai would show up, he did (P.205) and remained relevant as Steven narrated eloquently the advent of Google Chrome and the JavScript engine V8 … leveraging Google’s insistence on speed …
  • Stephen has interviewed most of, if not all, the technology leaders and we get to meet them at the relevant topics.

Trivia:

  • I think building 40 is called Building 0 or Nullplex. It is interesting as I work nextdoor – the only non-google building among the sea of bicycle trotting Googlers !
  • Pages Law according to Brin – “Every 18 months, software becomes twice as slow” !
  • Danger, which Andy Rubin cofounded, moved into the Palo Alto office when Google moved out of it in 1999 ! Eventually he left Danger and started Android …
  • Google always was structured like a PhD program dorm in a university – as Andy Rubin puts it “There is an implied grading on a 4.0 scale of the questions during interview and anybody less than 3.0 is rejected; the GPS (Google Product Strategy) meetings are run like a PhD defenses”
  • As told by Alan Eustace to Andy Rubin “Google’s brain is like a baby’s – an omnivorous sponge that was always getting smarter from the information it soaked up”!
  •  “We want Google to be that third half of your brain – Sergey, P.386
  • “It’s quite amazing how the horizon of impossibility is drifting these days” Thurun
  • The locus and trajectory of Google –“put Google in the driver ‘s seat on many decisions – large and small – that people make the course of a day and their lives![P.68]

Epilogue:

  • In this review I touched only a minimal set of interesting points (interesting to me!). The book has a lot of good read from Google’s China syndrome to how the Googlers shaped the last presidential election and later worked for the Obama administration to the controversies like Goggle view and the struggle with digitizing books.
  • One important development that Stephen couldn’t include, due to the timing of the release of the book, was Google+. But don’t despair – Stephen has written that part of the story as an article in wired ! Best to read it after finishing the book.
  • Readwriteweb has an article on the data scientist behind Google+
  • And Stephen’s blog on Motorolla Mobility purchase is another good read, again an important step by Google.
  • I just now saw a write up by infoworld on Google’s 5 biggest hits and misses.
  • Next book on my reading list “I’m Feeling Lucky: The Confessions of Google Employee Number 59” by Douglas Edward; it is on hold 3 of 7 from San Jose public library.