We had a good Data Science training session in Sheraton, Times Square, NY; second day of SparkSummit (East). It was my privilege to co-author and lead the Data Science track, along with Reza, Paco, Andy, Hossein, TD,Joseph and Xiangrui. I have shared the slideset at Slideshare as well as at the Databricks site.
[Update 4/12/15] : The video is posted at Youtube (5hrs!)
This was the second time I was involved with a training fully based off of the Databricks cloud and it worked out very well ! The Databricks cloud was very robust and resilient. Unfortunately we had problems with the wireless at the Sheraton Hotel !
The training was a mixture of hands-on and lecture.We sterted out with a dataset of 30 records and then moved onto the titanic dataset (900) to the movielens medium (1,000,000) and finally with the RecSyschallenge dataset (33,000,000!). What a progression in a day !
You can see the details in the slides. Ping me if you have any questions.
The training data consists of 33,003,944 clicks and 1,150,753 buys. Our mission, if we choose to accept is to predict the session-items bought from a test dataset of 8,251,791 clicks.
All at scale, in an elastic cloud, seamlessly moving between dev, model, stage and prod ! The magic of Databricks Cloud !
BTW, we also explored the State Of the Union Speeches from Washington, Lincoln, FDR, Clinton, Bush & Obama. The graphs below show a succinct view of the mood of the nation at each periods …
And finally after 100 slides later …!