Of Building Data Products

  • [Update 11/28/13] Notes from blog by Jon “Data Driven Disruption at Shuttershock” on what a data products company is
    1. Data is your product, regardless of what you sell
    2. Data is your lens into your business – Jon echo’s Peter’s insights viz. invest in data access; feel the pulse of the business & iterate
    3. Data creates your growth
  • Back to the main feature, Peter’s talk
  • A very insightful & informative talk by Peter Skomoroch of Linkedin via Zipfian academy
  • It is short & succinct, only 37 minutes. I urge all to watch
  • The slides of the talk “Developing Data Products” are at slideshare
  • Quick Notes:
    • A Data Product understands the world through inferential probabilistic models built on data
      • So collecting right data through “thoughtful” data design is very important
      • The data determines & precedes the feature set & the intelligence of your app
        • LinkedIn is a prime example – as they get more data, the app has become more intelligent, intuitive and ultimately more useful
        • Offer progressively sophisticated products, leveraging the data & insights, across the different user population segments – customer segmentation & stratification is not just for retail !
    • While more data, see “Unreasonable Effectiveness of Data” Distinguished Lecture by Peter Norvig, is good; for complex models, a deep understanding of the models and feature engineering would eventually be necessary (beyond the “black box”)
      • Data products about people, are usually complex, in terms of models as well as the data


[Update 12/13/13] Remember, a data product usually has the three layers – Interface, Inference & Intelligence.

XLDB Conference at Stanford – Quotable Quotes

The Extremely Large Database/XLDB 2013 Conference & the invited Workshop at Stanford had lots of good speakers and extremely interesting view points. I was able to attend and participate this year.

Previously I wrote two blogs on presentations by Google’s Jeff Dean :  and NEA’s Greg Papadopoulos

Here are the highlights from the presentations. Of course, you should read thru all the XLDB 2013 presentation slides.







Greg Papadopoulos : Make it Big by Working Fast and Small

Last week I attended the XLDB Conference and the invited Workshop at Stanford. I am planning on a series of blogs highlighting the talks. Of course, you should read thru all the XLDB 2013 presentation slides.

NEA’s Greg Papadopoulos had a view point on innovation and startups. Highlights in pictures. Of course, you should read thru the full presentation.



I really liked the “Common Characteristics Of Success”. Golden words indeed !

Scaling Big Data – Impermium

Came acorss an informative blog on scaling big data – “Built to Scale: How does Impermium process data?” Quick notes from the blog:

  1. Don’t fall in love with a technology so much that you cannot be separated – Be flexible in scaling as you grow

    • “Parting is such a sweet sorrow”, but change is an essential component of an infrastructure at scale 
    • The technology selection and consumption should be a continuous process, introducing new technologies as needed by the growth. I found Impermium’s path from grep to Solr to Elastic Search very illuminating; I have done the same before.  
  2. Technology needs are not static

    • A corollary of #1 above – Growth on all parts of the stack will not be uniform.
    • For example Impermium found scaling challenges in search and they moved to Solr & then to Elastic Search
  3. There are no perfect technologies

    • If you are doing interesting work, be ready to tango with open source code. This is essential – I also found this to be true.
    • Even if you don’t plan to change the code, many times deep understanding comes from reading the code
  4. Select technologies that you can dance with

    • The flip side is that one should select technologies that you are comfortable working under the hood.
    • In my case, while I love Erlang, I am not that comfortable with that language. So given a chance, I will go with Java or Scala
  5. Benchmark is nothing but a story in a specific context

    • So true. Benchmarks are transitory & personal.
    • Understand them, but they need not be true for your transforms, your data model and your processing.
    • Benchmark early & benchmark often … with your scenarions, models, transformations, mapreduces & data

Thanks Young for the short but very interesting blog. Keep up the good work …



All the President’s DevOps

In the heels of “All the President’s Data Scientists” another interesting article on the Obama campaign’s cloud infrastructure.

Update : A similar article The Atlantic’s “When the Nerds Go Marching In”

Update : Case Study from New Relic How the Obama For America team improved resilience


  • They realized the campaign needed a scalable system “2008 was the ‘Jaws’ moment,” said Obama for America’s Chief Technology Officer Harper Reed. “It was, ‘Oh my God, we’re going to need a bigger boat.”
  • They build a single shared data tier with APIs to build lots of interesting applications. “Being able to decouple all the apps from each other has such power; It allowed us to scale each app individually and to share a lot of data between the apps, and it really saved us a lot of time.”
  • They leveraged internet architecture “We aggressively stood on the shoulders of giants like Amazon, and used technology that was built by other people,”
  • Doesn’t look like they used esoteric technologies. The system is built around Python APIs over RDS, SQS and so forth. Excellent and the fact that the systems can built this way is a testament to the cloud capabilities – IaaS & PaaS
  • In short Reed says it all “”When you break it down to programming, we didn’t build a data store or a faster queue. All we did was put these pieces together and arrange them in the right order to give the field organization the tools they needed to do their job. And it worked out. It didn’t hurt that we had a really great candidate and the best ground game that the world has ever seen.”

Facebook Infrastructure @ New Years Eve – A study in Scalability

Another interesting article on how Facebook is preparing for the New Year’s Eve, this time from our own San Jose Mercury News By Mike Swift.

Interesting points:

  • New Year is one of the busiest times for social network sites as people post pictures & exchange best wishes

CEO Mark Zuckerberg has long been focused on having the digital horsepower to support unbridled growth — are a key reason behind the .. network’s success

  • It received > 1 B photo uploads during Haloween 2010
  • Since then Facebook added 200 million more members and so New Year Eve 2012 can see more than 1.5 B uploads !
  • My favorite quote from the article:

The primary reason Friendster died was because it couldn’t handle the volume of usage it had. … They (Mark,Dustin and Sean) always talked about not wanting to be ‘Friendstered,’ and they meant not being overwhelmed by excess usage that they hadn’t anticipated

  • The engineers at Facebook just finished a preflight checklist and are geared up for the scale
  • In terms of scale “Facebook now reaches 55 percent of the global Internet audience, according to Internet metrics firm comScore and accounts for one in every seven minutes spent online around the world.”
  • From a Big Data perspective, Facebook data has all the essential proprieties viz. Connected & Contextual in addition to large scale – Volume & Velocity (see my earlier blog on big data)
  • Facebook has the “Emergency Parachutes” which let the site degrade gracefully  (for example display smaller photos when the site is heavily loaded)
  • Their infrastructure instrumentation is legendary (for example, the MySQL talk here)

To manage Facebook’s data infrastructure, you kind of need to have this sense of amnesia. Nothing you learned or read about earlier in your career applies here …

And finally, Our New Year Wishes to all readers & well wishers of this blog 

The Jobs Logs : An ode to an icon

My heart aches, and a drowsy numbness pains
My sense, as though of hemlock I had drunk,
Or emptied some dull opiate to the drains
One minute past, and Lethe-wards had sunk:

Steven Paul Jobs is no more … Hard to believe, harder to accept and almost impossible to imagine a future with out him … Instead of lamenting I decided to draw inspiration …

For me his stanford speech aptly titled “How to live before you die” personifies Steve

It is fun to be a pirate than join the Navy“! and other quotes from this Slideshare

Work Hard to make simple” and other quotes from Marko Saric’s blog

Huffington Post got it right when it quoted Steve – “… focus and simplicity. Simple can be harder than complex …

In many ways the essence of Steve is the very famous quote from Gizmodo and others:

When you’re a carpenter making a beautiful chest of drawers, you’re not going to use a piece of plywood on the back, even though it faces the wall and nobody will ever see it. You’ll know it’s there, so you’re going to use a beautiful piece of wood on the back. For you to sleep well at night, the aesthetic, the quality, has to be carried all the way through.—Playboy, 1987

On a technology level, “Jobs refused to accept that software and hardware were best designed and engineered separately. For him, the venerable insight summarized by Thomas Hughes, the grand historian of American technology, as ‘the system must be first’ …” – A good POV in IEEE spectrum by Pascal

[Update 11/7/11] Malcom Gladwell’s review in The Tweaker in The New Yorker is an excellent read. ‘Jobs’s sensibility was more editorial than inventive. “I’ll know it when I see it,” …’.

“Was Steve Jobs a Samuel Crompton (inventor) or was he a Richard Roberts (the tinkerer)?” asks Malcom …

As Washington Post saysHe (Jobs) would be getting off here; we were to proceed without him into the unknown. Let it go and look ahead was the message all along.

Other sources of good Jobs quotes: 


How to Embrace Failure & Influence Scalability

As we continue our experiences with 10X scalability with our object store layer and get deeper into the design and development, it dawned on us that our first and foremost criteria is to befriend failures and architect for them! We have heard these ideas before, but it always becomes real when one feels their own growing pains.

We now understand very well what it means to “design a control plane for failure and tune the data for normal ops.”

… Read more at my Egnyte Engineering blog …

In the next blogs, we will talk about specific examples of how we embrace failures and influence scalability. The principles of Carnegie are not just for humans anymore – they are equally applicable to the machines we make, even when they are asleep and dreaming of electronic sheep! Or do they?

The Power of Curiosity and Inspiration – Jack Dorsey at Stanford

The little one was out on a sleepover & so spent an hour of the free time watching (and noting down points from) the video from Stanford’s Enterprise Corner

It is just beautiful. I urge all to watch it. My notes:

  • First couple of very insightful insights
    • You have to make every single detail perfect and you have to limit the number of details. If you pay attention to the smallest things while knowing what’s important, then everything else takes care of itself
    • Expect the unexpected and whenever possible, be the unexpected.” – marvelous & well said !
    • Apple is a theater company and Jack draws inspiration from Apple’s mode of operation!
      • Apple, I think, is run like a theater company. It has a great sense of pacing, a great sense of story & a great sense of execution. It’s all event-driven .. & stage-driven …
  • Birth Of Twitter :
    • He developed early version of Twitter in early 2000, but shelved it – “Wrong time, good idea, put it on the shelf”.
    • In 2005 tried it again – ” ..was given two weeks and one other programmer in Biz Stone to write the software.
    • “So, that’s how that sort of visualization and early desire to see the world led into Twitter” … And we did it”, and the rest of course is …
  • Origin of Square :
  • Power of a working product:
    • “The thing that really inspires people is a working product. When you’re pitching someone, the best thing you can do is show them something that works.” – Very good point
  • Payment is a form of communication:
    • Focusing on the user experience of money rather than the mechanics of transferring the value
  • Instrument Everything
    • Another good view point – Log, measure and test your infrastructure. They have an inference team focusing on the infrastructure instrumentation
    • “You have to instrument everything. For the first two years of Twitter’s life, we were flying blind. We had no idea what was going on with the network….”
  • Power of Story Telling & User Narratives:
    • Tell the story from a user perspective – like a play. One epic cohesive story, not a chain of short stories – solve a big problem
    • The product features fallout naturally from the user story
  • CEO As Chief Editor of the company’s Story:
    • Everyday the company generates “1000s of things that we could be doing but there’s only one or two that are important. As an editor, [the CEO is] constantly taking all these inputs and deciding on that one or that intersection of a few that makes sense for what we’re doing.”
      1. Editing People in and out
        • “… it’s always minding that team dynamic because at the end of the day, we’re just a group of people working on one single goal. If we can’t step in a cohesive coordinated fashion, then we’re going to trip all over the place…”
      2. Internal & External Communication stories
        • ” If you have that sort of high-level, this is where we’re going, this is the vision, this is the next 30 days … , it makes it very, very easy to set priorities and for all of the edges of the company to set their own priorities to do the right thing …”
      3. Editing the money (Revenue, Investors,…)
  • The Q & A had a few good insights as well.
    • Marketing Strategy
      • ” .. trying to do now is identify the key influencers in those merchant areas and make them distribution points.”
      • “A lot of the way I think about marketing is through the product itself. So, I think the marketing function, the best aspect and the best it can do is surface the product as much as possible.”
      • Understand the product introduction & adoption cycle
        • [They] have about three to five seconds to inspire someone to take action to actually get Square
        • then, [they] have about a week to get them to participate more – that’s by taking in transaction.
        • then, about a month to get them to be users forevermore.
      • Consumer Internet : “The more you can minimize the thinking around the mechanics in the moment, then more people are going to use it, more people are going to feel good about it.”
    • The importance of getting an idea out of one’s head & the cycle it follows -
      • ” … you need to get it out of your head. The reason you have to get it out of your head is you need to be able to see it on a surface that is not in your mind.”
      • ” Once you can see it and once you can step back from it, then you can also decide toshare it with others
      • the idea either gathers momentum
      • or you can decide to shelve it
    • Square is “focused on the payment experience and all the information and all the platform around payments…. . It’s building that cohesive story end to end.”
    • “… I think of Square as a startup with many startups inside of it. That’s how we’re organizing the company internally. We’re going to have a lot of different projects. They’ll be coordinated by this one cohesive unit outside.” – interesting

Giga Om has a good article – Jack Dorsey on Square, How It Works & Why It Disrupts

Another one in Technology Review – The New Money

My Next stop : Software For Data Analysis by Chambers

And after that : Battlestar Gallactica – The Mini Series (relevant especially in light of the new Computer Overlords !)

And a little