The Curious Case of the Data Scientist Profession


Data Science & the profession of a Data Scientist is being debated, rationalized, defined and refactored … I think the domain & the profession is maturing and our understanding of the Mythical Data Scientist is getting more pragmatic.

Now to the highlights:

1. Data Scientist is multi-faceted & contextual

  • Two points – It requires a multitude of skills & different skill sets at different situations; and definitely is a team effort.
  • This tweet sums it all
  • DataScienceTeam
  • Sometimes a Data Scientist has to tell a good business story to make an impact; other times the algorithm wins the day
    • Harlan in his blog identifies four combinations – Data Business Person, Data Creative, Data Engineer & Data Researcher
      • I don’t fully agree with the diagram – it has lot less programming & little more math; math is usually built-in the ML algorithms and the implementation is embedded in math libraries developed by the optimization specialists. A Data Scientist should n’t be twiddling with the math libraries
    • I had proposed the idea of a Data Science Engineer last year with similar thoughts; and elaborated more at “Who or what is a Data Scientist?
    • The BAH Field Guide suggests the following mix:
    • Data Scienc 03
    • I would prefer to see more ML than M. ML is the higher from of applied M and also includes Statistics
  • Domain Expertise and the ability to identify the correct problems are very important skills of a Data Scientist, says John Forman.
  • Or as Rachel Schutt at Columbia quotes:
    • Josh Wills (Cloudera)
      • Data Scientist (noun): Person who is better at statistics than any software engineer & better at software engineering than any statistician

    • Will Cukierski (Kaggle) retorts
      • Data Scientist (noun): Person who is worse at statistics than any statistician & worse at software engineering than any software engineer

2. The Data Scientist team should be building data products

3.  To tell the data story effectively, the supporting cast is essential

  • As Vishal puts it in his blog,
    • Data must be there & processable – the story definitely depends on the data
    • Processes & buy-in from management – many times, it is not the inference that is the bottle neck but the business processes that needs to be changed to implement the inferences & insights
    • As the BAH Field Guide says it:
    • Data Scienc 04
    • DS01

 4.  Pay attention to how the Data Science team is organized

5. Data Science is a continuum of Sophistication & Maturity – a marathon than a spirint

Let me stop here, I think the blog is getting long already …

 

 

Is it still “Artificial” Intelligence, if our Computers learn -to think- from the workings of our Brain ?


Image

  • In fact that would be Natural Intelligence ! Intelligence is intelligence – it is a way of processing information to arrive at inferences, recommendations, predictions and so forth …

May be it is that Contemporary AI is actually just NI !

Point #1 : Machines are thinking like humans rather than acting like Humans

  • Primitives inspired by Computational Neuroscience like DeepLearning are becoming mainstream. We are no more enamored with Expert Systems that learn the rules & replace humans. We would rather have our machines help us chug through the huge amount of data.

We would rather interact with them via Google Glass – a two-way, highly interactive medium that act as a sensor array as well as augment cognition with a digital overlay over the real world

  • In fact, till now, our computers were mere brutes, without the elegance and finesse of the human touch !
  • Now the computers are diverging from Newtonian determinism to probabilistic generative models.
  • Instead of using greedy algorithms, the machines are now being introduced to Genetic Algorithms & Simulated Annealing. They now realize that local minima, computed via exhaustive brute force, are not the answers for all problems.
  • They now have knowledge graphs and have the capability to infer based on graph traversals and associated logic

Of course, deterministic transactional systems have their important place – we don’t want a probabilistic bank balance!

Point #2 : We don’t even want our machines to be like us

  • The operative word is “Augmented Cognition” – our machines should help us where we are not strong and augment our capabilities. More later …
  • Taking a cue from the contemporary media, “Person Of Interest” is a better model than “I,Robot” or “Almost Human” – a Mr.Spock, rather than a Sonny; Logical but resorts to the improbable and the random, when the impossible has been eliminated !

Point #3 : Now we are able to separate Interface from Inference & Intelligence

AI-03

  • New Yorker asks, “Why can’t my computer understand me?” Finding answers to questions like “Can an alligator run the hundred-meter hurdles?” is syntax.
  • NLP (Natural Language Processing) and it’s first cousin NLU(Natural Language Understanding) are not intelligence, they are interface.
  • In fact, the team that built IBM Watson realized that “they didn’t need a genius, … but build the world’s most impressive dilettante … battling the efficient human mind with spectacular flamboyant inefficiency”.

Taking this line of thought to it’s extreme, one can argue that Google (Search) itself is the case and point of an ostentatious and elaborate infrastructure for what it does … no intelligence whatsoever – Artificial or Natural ! It should have been based on knowledge graph rather than a referral graph. Of course, in a few years, they would have made huge progress, no doubt.

  • BTW, Stephen Baker has captured the “Philosophy of an Intelligent Machine” very well.
  • I have been & am keeping track of the progress by Watson.
  • Since then, IBM Watson. itself, has made rapid progress in the areas of Knowledge Traversal & Contextual Probabilistic Inferences i.e. ingest large volume of unstructured data/knowledge & reason about it
  • I am not trivializing the effort and the significance of machines to understand the nuances of human interactions (speech, sarcasm, slang, irony, humor, satire et al); but we need to realize that, that is not an indication of intelligence or a measure what machines can do.

Human Interface is not Human Intelligence, same with machines. They need not look like us, walk like us, or even talk like us. They just need to augment us where we are not strong … with the right interface, of course

  • Gary Markus in New Yorker article “Can Super Mario Save AI” says “Human brains are remarkably inefficient in some key ways: our memories are lousy; our grasp of logic is shallow, and our capacity to do arithmetic is dismal. Our collective cognitive shortcomings are so numerous … And yet, in some ways, we continue to far outstrip the very silicon-based computers that so thoroughly kick our carbon-based behinds in arithmetic, logic, and memory …

Well said Gary. Humans & Machines should learn from the other and complement … not mimic each other … And there is nothing Artificial about it …

I really wish we take “Artificial” out of AI – Just incorporate what we are learning about ourselves into our computers & leave it at that !

Finally:

AI-04-01

The Art of an Insightful Recommendation


  • I have been working multiple aspects of recommendation including AI & DeepLearning
  • Came across an insightful talk by Eric Colson of Stitch Fix at Strata 2013 titled “Committing to Recommendation Algorithms”
  • Short, succinct & very informative. Slides
  • It is only ~8 min. So I urge you all to watch it.
  • I took down some notes and created couple of collages out of the presentations.
  • Strong Algorithms

  • StitchFix-01
  • Human Judgement

StitchFix-02

You see, their value proposition goes beyond convenience.
StitchFix-03
They provide a shopping experience beyond the casual encounter in a store or browse on a web page - The ability to find things that one wouldn’t have find on one’s own – and that is priceless!

Of Building Data Products


  • [Update 11/28/13] Notes from blog by Jon “Data Driven Disruption at Shuttershock” on what a data products company is
    1. Data is your product, regardless of what you sell
    2. Data is your lens into your business – Jon echo’s Peter’s insights viz. invest in data access; feel the pulse of the business & iterate
    3. Data creates your growth
  • Back to the main feature, Peter’s talk
  • A very insightful & informative talk by Peter Skomoroch of Linkedin via Zipfian academy
  • It is short & succinct, only 37 minutes. I urge all to watch
  • The slides of the talk “Developing Data Products” are at slideshare
  • Quick Notes:
    • A Data Product understands the world through inferential probabilistic models built on data
      • So collecting right data through “thoughtful” data design is very important
      • The data determines & precedes the feature set & the intelligence of your app
        • LinkedIn is a prime example – as they get more data, the app has become more intelligent, intuitive and ultimately more useful
        • Offer progressively sophisticated products, leveraging the data & insights, across the different user population segments – customer segmentation & stratification is not just for retail !
    • While more data, see “Unreasonable Effectiveness of Data” Distinguished Lecture by Peter Norvig, is good; for complex models, a deep understanding of the models and feature engineering would eventually be necessary (beyond the “black box”)
      • Data products about people, are usually complex, in terms of models as well as the data

Image

[Update 12/13/13] Remember, a data product usually has the three layers – Interface, Inference & Intelligence.

Big Data on the other side of the Trough of Disillusionment


5. Don’t implement a technology infrastructure but the end-to-end pipeline a.k.a. Bytes To Business

SImple Reason : Business doesn’t care about a shiny infrastructure, but about capabilities they can take to market …

AI-Arch-21-P199

4. Think Business Relevance and agility from multiple points of view

Aggregate Even Bigger Datasets, Scenarios and Use Cases

  • Be flexible, tell your stories, leveraging smart data, based on ever changing crisp business use cases & requirements

3. Big Data cuts across enterprise silos – facilitate organization change and adoption

  • Data always has been siloed, with each function having it’s own datasets – transactional as well as data marts
  • Big Data, by definition is heterogeneous & muti-schema
  • Data refresh, source of truth, organizational politics and even fear comes in the picture. Deal with them in a positive way

2. Build Data Products

1. tbd

  • One more for the road …

XLDB Conference at Stanford – Quotable Quotes


xldb-09
The Extremely Large Database/XLDB 2013 Conference & the invited Workshop at Stanford had lots of good speakers and extremely interesting view points. I was able to attend and participate this year.

Previously I wrote two blogs on presentations by Google’s Jeff Dean :  and NEA’s Greg Papadopoulos

Here are the highlights from the presentations. Of course, you should read thru all the XLDB 2013 presentation slides.

xldb-08

xldb-10

xldb-11

xldb-12

xldb-13

xldb-14

Jeff Dean : Lessons Learned While Building Infrastructure Software at Google


Image
Last week I attended the XLDB Conference and the invited Workshop at Stanford. I am planning on a series of blogs highlighting the talks. Of course, you should read thru all the XLDB 2013 presentation slides.

Google’s Jeff Dean had an interesting presentation about his experience building GFS, MapReduce, BigTable & Spanner. For those interested in these papers, I have organized them – A Path through NOSQL Reading 

Highlights in pictures (Full slides at XLDB 2013 site):

xldb-02

xldb-03

Deep Learning – The Next Frontier ?


Updates:

  1. [12/29/13] Forbes : A general article on the significance of Machine Learning, AI et al
  2. [12/28/13] NY Times Brain-Like Computers, Learning from Experience
  3. [8/9/13] Kind of jumbled blog in Forbes – What is DeepLearning & Why should businesses care
  4. DeepLearning for the masses

Came across an informative talk by Jeff Dean on Deep Learning at Google.

Image

Leaving cat herding aside, I strongly believe that Deep Learning & AI can be effectively used in the context of Big Data to address interesting problems in the Enterprises, especially Retail & Banking.

Trying to build a few neurons to run some interesting big data use cases …

Hence my interest in this domain

Let me share some of the interesting points from Jeff’s talk, annotating the slides. Of course, you should listen to his talk – it is short & succinct …

But plan to spend some time to think through as well as research into related papers … The DistBelief paper has the potential to be as influential as the Jeff’s MapReduce paper.

Image

DL-Dean-12

DL-Dean-05

DL-Dean-07

DL-Dean-10

DL-Dean-11

Related Links:

  1. Jeremy Howard of Kaggle tags Deep Learning as The Biggest Data Science Breakthrough of the Decade ! What says thee ?
  2. Excellent overview from IEEE Computational Intelligence – “Deep Machine Learning—A New Frontier in Artificial Intelligence Research”
  3. AI Breakthrough – From MIT Technology Review
  4. New York Times has a good article on Deep Learning “Scientists See Promise in Deep-Learning Programs”
  5. Huffington Post talks about Big Data & Deep Learning

Data Science Engineers – The new breed of Data Scientists ?


While there is lots of interesting discussions about Data Scientists, or lack there of. The role of Data Science in Big Data is well understood. I think the need is actually for Data Science Engineers. I had a set of pictures explaining this concept and interestingly came across a blog by HortonWorks on the topic of Data Scientists.

[Update Jan 25,2014] Harlan Harris has a similar theory, with a wider perspective. The Data Engineer in his model corresponds to the Data Science Engineer I had proposed !

[Update May 19, 2013] An informative blog in the Wall Street Journal about Data Scientists by IBM’s Irving Wladawsky-Berger – Data Science is a multidisciplinary evolution from business intelligence & analytics. In addition to having a solid foundation in statistics, math, data engineering and computer science, data scientists must also have domain expertise.

DataScienceTeam

Data Science Engineers 01-02

Data Science Engineers 01-01

What says thee ?

Is our Neocortex a Giant Semantic Bloom Filter ? Of Natural Intelligence, Machine Learning & Jeff Hawkins


L’Apéritif:

Image

In a set of four lectures spanning about 3 years, Jeff Hawkins explains how & why big data can only be solved by evolutionary-adaptive-continuously-learning models incorporating principles from the working of Neocortex.
It does make sense – especially for NLP, NLU & Knowledge Representation. I am a big fan of the Borgs and their coordinated intelligence.

These are my annotated picture-notes …

L’Entrée:

Let me begin at the beginning. The other day I came across 4 very interesting talks by Jeff Hawkins on Biological Inspired Machine intelligence.

Call it serendipity because we have been looking for more effective ways for Knowledge Representation (KR) & Natural Language Understanding (NLU)

For example movie names, while very easy for humans to understand, a MaxEnt NER finds it very hard.  Knowledge Representation & Association is more harder !

We are experimenting with a few techniques like word-based tries (ie. spell-check sentences by words), higher order federated Bloom Filters and n-gram hashing. Planning to incorporate some of Jeff’s ideas …

I digress … Topics for another day … back to Jeff & Machine Intelligence …

Very inspiring, extremely thought provoking talks – as usual the inimitable Jeff Hawkins at his best

  1. Google Tech Talk : Jeff Hawkins, “Building Brains to Understand the World’s Data
  2. UC Berkeley Graduate Lectures
  3. “Advances in Modeling Neocortex and its impact on Machine Intelligence” by Jeff Hawkins,  Smith Group Lecture presented at the Beckman Institute for Advanced Science & Technology at the University of Illinois at Urbana-Champaign

Le Plat Principal:

The four talks have lot of depth and are packed. Moreover Jeff talks very fast – I listened to the talks a few times – at least 3 hrs per one hour talk. You should listen to them slowly & rewind as reqd. It takes a few hours to get one’s head around the various ideas.

Let me annotate a few of his slides – those I was able to internalize to some extent:

Focus & premise[3]:

Hawkins-100-02-01

The assertion, that many problems can only be solved by incorporating principles from the working on Neocortex, is interesting.

BTW, it does make sense – especially for NLU & Knowledge Representation.

As Jeff mentions later, the behavior need not be human-like, but the representation, interpretation & “understanding” would be.

Neocortex Architecture[3]:

“Neocortex is just a sheet of cells  2mm thick, the size of a dinner napkin” – Amazing what it can do!

Hawkins-100-03-01

The Six Principal Essentials of Biological Intelligence

The picture says it all.

Hawkins-100-04-01

Learning involves training and adaptive connections

Hawkins-100-05-01

The concept of streaming events & the learning mechanisms

Patterns from complex data streams

Hawkins-100-06-01

The paper “Hierarchical Temporal memory” has the gory details about the Hierarchical Temporal Learning.

Future

Hawkins-100-09-01

Interesting observation: Emotion, the fundamental aspect of being human, is not a requirement for intelligence – reminds us of Spock, of course.

Machine intelligence is not about replicating human behavior or even passing the turing test. I agree on this – we need the machines to think & do things we cannot do thus augmenting us. Make us stronger where we are weak !

Le Digestif

What interested me most was the sematic knowledge representation, NLP & NLU. The ability to understand and store concepts, the capacity to generalize as well as the mechanisms of strengthening and weakening connections based on external signals – just beautiful …

Agree that the Sparse Distributed Representation could be the language of all the intelligent machines.

The SDR looks a lot like a giant Bloom Filter

Hawkins-100-10-01

Hawkins-100-11-01The planes can be considered as rows and a column as the temporal dimension of the semantic mapping (the memory of sequences). Which equates to a giant n-dimensional Bloom Filer – a data structure we can grok (Pun intended as Jeff’s product is called Grok!).

The bloom filter analogy, while extremely simplistic, is conceptually congruent, in the sense that “similar values have similar representation”, of course depending on the hash algorithm.

After listening to the talks and thinking them over, I have a thousand questions in many directions. I will post the answers as we develop this through for our needs. Please send in your insights as comments to this blog. AM sure it will help a few folks !

Hawkins-100-12-01

  1. How do we handle semantic categories ? 
  2. How do we build more sophisticated representations based on spatial patterns ?
  3. What is the hash function that maps a slice of semantic to this giant Bloom Filter ?
  4. How does it handle collision? Corruption ? Clustering for resiliency/self adjusting representation ?
    • Collision might be good and I think that is what Jeff calls as semantic generalization
  5. How does the semantic slice mapping function differentiate between a search & computation to trigger appropriate actions?
    • For example the following two questions require different actions: 
      • What is stock price of IBM ?” vs.
      • What is the volatility as reflected in the beta of IBM for this quarter ?” 
      • The first one is a search while the second has computation …
  6. Is the hash function same for all of us or is it different for each person ?
    • Most probably the function is a learned artifact.
  7. Another interesting vector is the Hierarchy & higher patterns of temporal coalescence/slowness – the high-order capability, tweaking the learning rates across the layers.
    • How can this be modeled with the analytical data structures we have?
    • And what are the mechanics for stable representation of pattern sequences – because with dynamicity and temporality comes the difficulty of snapshots and consistency between them.
    • The unique representation of the same sequence, at a later time in context of the earlier invocation is interesting …
  8. How do we “put a classifier on the top” ?
    • Play with permanence? Probability?
  9. What are the algorithms to prevent run away prediction?
    • I agree that we could account for rapid state difference vs. slower state; we still will have to encapsulate it in some form of code

Finally, can we build “Amazingly Intelligent Machines?” Yes We can !

And agree with Jeff that “It is essential, for the survival of the spices, that we build them” …