The Sense & Sensibility of a Data Scientist DevOps


The other day I was pondering the subject of a Data Scientist & model deployment at scale as we are developing our data science layers consisting of Hadoop, HBase & Apache Spark. Interestingly earlier today I came across two artifacts – a talk by Cloudera’s @josh_wills and a presentation by (again) Cloudera’s Ian Buss.

The talks made a lot of sense independently, but add a lot more insight – collectively !  The context, of course, is the exposition of the curious case of data scientists as devops. The data products need an evolving data science layer …

It is well worth your time to follow the links above and listen to Josh as well as go thru Ian’s slides. Let me highlight some of the points that I was able to internalize …

Let me start with one picture that “rules them all” & summarizes the  synergy. The “Shift In Perspective” from Josh & the Spark slide from Ian

JW-01

The concept of Data Scientist devops is very relevant. It extends the curious case of the Data Scientist profession to the next level.

Data products live & breath in the wild, they cannot be developed and maintained with a static set of the data. Developing an R model and then throwing it over the wall for a developer to translate won’t work.  Secondly, we need models that can learn & evolve in their parameter space.

JW-02

 I agree with the current wisdom that Apache Spark is a good framework that spans the reason,model & deploy stages of data. 

Other interesting insights from Josh’s talk.

Finally,

The virtues of being really smart is massively overrated; the virtues of being able to learn faster is massively underrated

Well said Josh.
P.S: Couldn’t find the video of Ian’s talk at the Data Science London meetup. Should be an interesting talk to watch …

7 thoughts on “The Sense & Sensibility of a Data Scientist DevOps

  1. Pingback: Business Users Shouldn’t touch Hadoop even with a 99-foot pole ! | My missives

  2. Pingback: The Curious Case of the Data Scientist Profession | My missives

  3. “Do the simplest thing that could possibly work”
    needs to be in quotations.

    It is the ever-repeated mantra created by Ward Cunningham, inventor of the wiki and one of the “sun sources” of agile methodologies. It is also how Ward lives on a daily basis.

  4. Pingback: Data Science is the new Electronics | My missives

Leave a comment