Google’s Jeff Dean on Scalable Predictive DeepLearning – A Kitbizer’s notes from Recsys 2014 (Note :


It is always interesting to hear from Jeff and understand what he is upto. I have blogged about his earlier talks at XLDB and at Stanford. Jeff Dean’s Keynote at RecSys2014 was no exception. The talk was interesting, the Q&A was stimulating and the links to papers … now we have more work ! – I have a reading list at the end.

Of course, you should watch it (YouTube Link) Highlights of his talk, from my notes …

dean-recsys-01

  • Build a system with simple algorithms and then throw lots of data – let the system build the abstractions. Interesting line of thought;
  • I remember hearing about it from Peter Norwig as well ie Google is interested in algorithms that get better with data
  • An effective recommendation system requires context ie. understand the user’s surroundings, previous behavior of the user, previous aggregated behavior of many other users and finally textual understanding.

dean-recsys-02-01


  • He then elaborated one of the area they are working on — semantic embeddings, paragraph vector and similar mechanisms

dean-recsys-03

Interesting concept of embedding similar things such that they are nearby in a high dimensional space!

  • Jeff then talked about using LSTM (Long Short-Term Memory) Neural Networks for translation.

pic09-01

  • Notes from Q & A:
    • The async training of the model and random initialization means that different runs will result in different models; but results are within epsilon
    • Currently, they are handcrafting the topology of these networks ie now many layers, how many nodes, the connections et al. Evolving the architecture (for example adding a neuron when an interesting feature is discovered) is still a research topic.
      • Between ages of 2 & 4, our brain creates 500K neurons / sec and from 5 to 15, starts pruning them !
    • The models are opaque and do not have explainability. One way Google is approaching this is by building tools that introspect the models … interesting
    • These models work well for classification as well as ranking. (Note : I should try this – may be for a Kaggle competition. 2015 RecSys Challenge !)
    • Training CTR system on a nightly basis ?
    • Connections & Scale of the models
      • Vision : Billions of connections
      • Language embeddings : 1000s of millions of connections
      • If one has more data, one should have less parameters;otherwise it will overfit
      • Rule of thumb : For sparse representations, one parameter per record
    • Paragraph vector can capture granular levels while a deep lSTM might be better in capturing the details – TBD
    • Debugging is still an art. Check the modelling; factor into smaller problems; see if different data is required
    • RBMs and energy based models have not found their way into GOOGL’s production; NNs are finding applications
    • Simplification & Complexity : NNs, once you get them working, forms this nice “Algoritmically simple computation mechanisms” in a darkish-brown box ! Less sub systems, less human engineering ! At a different axis of complexity
    • Embedding editorial policies is not easy, better to overlay them … [Note : We have an architecture where the pre and post processors annotate the recommendations/results from a DL system]
  • There are some interesting papers on both the topics that Jeff mentioned (This my reading list for the next few months! Hope it is useful to you as well !):
    1. Efficient Estimation of Word Representations in Vector Space [Link]
    2. Paragraph vector : Distributed Representations of Sentences and Documents [Link]
    3. [Quoc V.lee 's home page]
    4. Distributed Representations of Words and Phrases and their Compositionality [Link]
    5. Deep Visual-Semantic Embedding Model [Link]
    6. Sequence to Sequence Learning with Neural Networks [Link]
    7. Building high-level features using large scale unsupervised learning [Link]
    8. word2vec Tool for computing continuous distribution of words [Link]
    9. Large Scale Distributed Deep Networks [Link]
    10. Deep Neural Networks for Object Detection [Link]
    11. Playing Atari with Deep Reinforcement Learning [Link]
    12. Papers by Google’s Deep Learning Team [Link to Vincent Vanhoucke's Page]
    13. And, last but not least, Jeff Dean’s Page

The talk was cut off after ~45 minutes. Am hoping they would publish the rest and the slides. Will add pointers when they are on-line. Drop me a note if you catch them …

Update [10/12/14 21:49] : They have posted the second half ! An watching it now !

 Context : I couldn’t attend the RecSys 2014; luckily they have the sessions on YouTube. Plan to watch, take notes & blog the highlights; Recommendation Systems are one of my interest areas.

  • Next : Netflix’s CPO Neal Hunt’s Keynote
  • Next + 1 : Future Of recommender Systems
  • Next + 2 : Interesting Notes from rest of the sessions
  • Oh man, I really missed the RecSysTV session. We are working on some addressable recommendations. Already reading the papers. Didn’t see the video for the RecSysTV sessions ;o(

The Sense & Sensibility of a Data Scientist DevOps


The other day I was pondering the subject of a Data Scientist & model deployment at scale as we are developing our data science layers consisting of Hadoop, HBase & Apache Spark. Interestingly earlier today I came across two artifacts – a talk by Cloudera’s @josh_wills and a presentation by (again) Cloudera’s Ian Buss.

The talks made a lot of sense independently, but add a lot more insight – collectively !  The context, of course, is the exposition of the curious case of data scientists as devops. The data products need an evolving data science layer …

It is well worth your time to follow the links above and listen to Josh as well as go thru Ian’s slides. Let me highlight some of the points that I was able to internalize …

Let me start with one picture that “rules them all” & summarizes the  synergy. The “Shift In Perspective” from Josh & the Spark slide from Ian

JW-01

The concept of Data Scientist devops is very relevant. It extends the curious case of the Data Scientist profession to the next level.

Data products live & breath in the wild, they cannot be developed and maintained with a static set of the data. Developing an R model and then throwing it over the wall for a developer to translate won’t work.  Secondly, we need models that can learn & evolve in their parameter space.

 

JW-02

 I agree with the current wisdom that Apache Spark is a good framework that spans the reason,model & deploy stages of data. 

Other interesting insights from Josh’s talk.

Finally,

The virtues of being really smart is massively overrated; the virtues of being able to learn faster is massively underrated

Well said Josh.
P.S: Couldn’t find the video of Ian’s talk at the Data Science London meetup. Should be an interesting talk to watch …

AWS EC2 Price worksheet


It all started with a tweet Image

  • It so happens that I have been working on a similar worksheet for pricing & configuring our analytics infrastructure;
  • I modified the one I am working on (inspired by the original at ec2 pricing_and_capacity) & morphed into the one Otis wanted
  • The Excel worksheet is hosted it in github. Feel free to modify it to fit your needs. Let me know as well …
  • I have four sets of prices viz. on-demand, reserved-light,reserved-medium usage and reserved- heavy usage. The prices are calculated for one year (8640 hrs) off of the cell M1 – one has to prorate the upfront fees to get the effective $/hr rate
  • The worksheet has multiple uses – I use it to compute the price difference for different usage patterns-high memory for Spark, different sizes for HBase cluster et al. As it is a spreadsheet one could sort it on varying criteria; one could change the numbers (say 6 months) and see what model makes sense.
  • BTW, it is interesting to see that the Light -Reserved costs more in all cases except for the storage models.
  • Long time ago, I had a graphical representation, which has become very dated. I might resurrect it with the new prices …

The Spreadsheet :

The left columns have the attributes of the various EC2 models.

Image

 

The 8640 (hrs/year) is in M1. All the calculations are based on this cell. The reserved light is interesting … it costs more !

Image

The reserved medium does save $. Moreover, one can stop the instances when not needed.

Image

I have calculated the yearly price prorating the upfront fees et al. But for Heavy Reserved, it is somewhat meaningless as they will charge for the whole year even if the instances are stopped. But changing the value in M1 gives a feel for the different tiers …

Image

I would be happy to hear other inferences we can make and add columns to the worksheet …

Cheers

 

The Chronicles of Robotics at First Lego League – Day 1


This week am at St.Louis, volunteering at the First lego league World Robotics Competition. Have been involved with First Robotics since 2004. Usually my position is Robotic Design Judge – a front seat view to interesting & innovative ideas on Robots.

For 3 days we have the Edwards Jones Dome & the America Center in St.Louis, MO.

Day 1 : Stardate : 91913.81

Judges’ on-site meeting & briefing, allocations & FLL opening ceremonies.

Some quick pictures … Full day of judging starts tomorrow early morning  … looking forward to it ….

  • View From my Hotel

  • FLL-01 FLL-02 RoomView-01RoomView-02
  • The Trophies

  • awards-01 awards-02
  • The Field 

  • The field occupies the stadium. It consists of six areas – Einstein(FLL), Geleleo, Franklin, Newton, Edison & Curie. I have tried to capture the view of the fields from the ground and from the bleachers.
  • Galeleo

  • Field-01-01 Field-01-02 Field-03 Field-04-01Field-04Field-01 Field-05
  • NASA Truck to beam the competitions live via satellite

  • Field-06 Field-07
  • Franklin & Newton

  • Field-02-01 Field-02 Fig-05-01 Field-08 Field-09
  • Einstein (FLL and the venue of opening ceremonies – below)

  • Field-10 Field-11
  • FLL Opening Ceremonies

  • Open-01 Open-02 Open-03-01 Open-03 Open-04
  • Tomorrow is a busy day – robot judging whole day. Might not get time to take pictures
  • Still have to cover the pits, convention center hall filled with team stalls et al. One has to be there to understand the scale and the energy !

Data Science Folk Knowledge & Words of Wisdom


I have collected some words of wisdom on Data Science & Machine Learning for my pycon 2014 Tutorial “Data Wrangling”.
Posting as a blog. The pdf is at Slideshare. Would appreciate comments, insights & corrections.

Slide01

Slide02

  • Slide11
  • Slide12
  • Slide13
  • Slide14
  • Slide15
  • Slide16
  • Slide17
  • Slide18
  • Slide19
  • Slide20
  • Slide21
  • Relevant Papers To Read

  • Slide22
  • An ordered list of mooc links & books. This was my answer in Quora

  • Slide23

A Glimpse of Google, NASA & Peter Norvig + The Restaurant at the End of the Universe


I came across an interesting talk by Google’s Peter Norvig at NASA.

Of course, you should listen to the talk – let me blog about a couple of points that are of interest to me:

Algorithms that get better with Data

Peter had two good points:

Norvig-01

  • Algorithms behave differently as they churn thru more data. For example in the figure, the Blue algorithm was better with a million training dataset. If one had stopped at that scale, one would be tempted to optimize that algorithm for better performance
  • But as the scale increased, the purple algorithm started showing promise – in fact the blue one starts deteriorating at larger scale. The old adage “don’t do premature optimization” is true here as well. 
  • Norvig-02In general, Google prefers algorithms that get better with data. Not all algorithms are like that, but Google likes to go after the ones with this type of performance characteristic. 

There is no serendipity in Google Search or Google Translate

  • There is no serendipity in search – it is just rehashing. It is good for finding things, but not at all useful for understanding, interpolation & ultimately inference. I think Intelligent Search is an oxymoron ;o)
  • Same with Google Translate. Google Translate takes all it’s cue from the web – it wouldn’t help us communicate with either the non-human inhabitants of this planet or any life form from other planets/milky ways.
    • In that sense, I am a little disappointed with Google’s Translation Engines.  OTOH, I have only a minuscule view of the work at Google.

The future of human-machine & Augmented Cognition

And, don’t belong to the B-Ark !