The Agonies & Ecstasies of Cloud Storage

“To cloud or not to cloud … ”  That is the question many are asking in the wake of the news surrounding the seizure of MegaUpload last week. Aside from being a pun on Hamlet’s soliloquy, this is a very poignant question, because cloud storage is becoming an inseparable part of modern life (be it Apple’s iCloud, Microsoft’s SkyDrive, Amazon’s Cloud Drive, Egnyte or the ‘box’es …) for consumers as well as enterprises.

In many ways the question is not whether to use cloud storage, but how can one use cloud storage effectively and minimize the risk to business disruption.

I have a few pointers in that direction …

  • Don’t mix Consumer Cloud Services & Enterprise Class Services
  • Use hybrid storage cloud rather than a pure cloud-only service
  • Manage the data lifecycle effectively
  • Match the business requirements and the domain impedance
  • Pay attention to data interoperability

Please allow me to explain … The gory details at my Egnyte blog

Facebook Infrastructure @ New Years Eve – A study in Scalability

Another interesting article on how Facebook is preparing for the New Year’s Eve, this time from our own San Jose Mercury News By Mike Swift.

Interesting points:

  • New Year is one of the busiest times for social network sites as people post pictures & exchange best wishes

CEO Mark Zuckerberg has long been focused on having the digital horsepower to support unbridled growth — are a key reason behind the .. network’s success

  • It received > 1 B photo uploads during Haloween 2010
  • Since then Facebook added 200 million more members and so New Year Eve 2012 can see more than 1.5 B uploads !
  • My favorite quote from the article:

The primary reason Friendster died was because it couldn’t handle the volume of usage it had. … They (Mark,Dustin and Sean) always talked about not wanting to be ‘Friendstered,’ and they meant not being overwhelmed by excess usage that they hadn’t anticipated

  • The engineers at Facebook just finished a preflight checklist and are geared up for the scale
  • In terms of scale “Facebook now reaches 55 percent of the global Internet audience, according to Internet metrics firm comScore and accounts for one in every seven minutes spent online around the world.”
  • From a Big Data perspective, Facebook data has all the essential proprieties viz. Connected & Contextual in addition to large scale – Volume & Velocity (see my earlier blog on big data)
  • Facebook has the “Emergency Parachutes” which let the site degrade gracefully  (for example display smaller photos when the site is heavily loaded)
  • Their infrastructure instrumentation is legendary (for example, the MySQL talk here)

To manage Facebook’s data infrastructure, you kind of need to have this sense of amnesia. Nothing you learned or read about earlier in your career applies here …

 
And finally, Our New Year Wishes to all readers & well wishers of this blog 

This blog – 2011 in review

The WordPress.com stats helper monkeys prepared a 2011 annual report for this blog.

Here’s an excerpt:

The concert hall at the Syndey Opera House holds 2,700 people. This blog was viewed about 27,000 times in 2011. If it were a concert at Sydney Opera House, it would take about 10 sold-out performances for that many people to see it.

MySQL at Facebook – Current & Future

Thanks to Todd’s High Scalability, I came across the talk about MySQL at Facebook – Current & Future.

Around 45 min. Good talk. I jotted down some quick notes and couple of slides that captured my interest:

  • UDB – Universal DB (Was called User database before)
  • Shards:
    • Remote references in both shards
    • Backout mechanisms
    • Async job that fixes problems
  • Replication
    • Binlog consumers eg. memcache invalidators for geographical replication
    • Backup consumers for binary logs
  • Facebook has a single cluster!
    • Ease of setup
    • Uniformity of Access patterns
  • Custom Facebook build of 5.1(we can get it, open source), InnoDB
    • Admission Control Feature (Specific to their build) looks interesting
  • Hardware with more cores, but I/O bound
  • Looking at newer Compression Algorithms
    • qpress (quickLZ)
    • snappy compression alg (Google)
  • InnoDB purge Lag – Happens if the DB is busy all the time, eg. backup doesn’t give purge to clean fully
  • Custom Tooling
    • Per-Table restore
    • Truly incremental backup
    • Online schema changes
  • Monitoring
  • When you have 10,000 servers averages hide things. So need top N monitoring
  • They track and analyze stalls
    • Stall tools like dogpiled, aspera
    • Started with multi-second stalls, now track subsecond stalls
    • Kernel mutex stall – see it constantly
  • MySQL never a solved problem!
  • The Top N monitoring and stall monitoring is applicable outside MySQL as well
  • Have attached a few slides that I found interesting …

Is Hadoop the new stored procedure ?

Two things happened today for me to ask this question and am not offering any serious answer, yet !

  • First I had some quick chat with folks at MongoDB and as a result got me thinking about the MapReduce in Mongo and where could it go. MongoDB also has the new declarative aggregation framework.
  • My thesis is that, while now the MongoDB aggregation framework is JSON semantics+$ keywords, it could look a lot like a functional programming language – with high-order declarative functions like map/reduce, discriminated unions (like F#) and currying.
  • And later in the day I read Edd’s blog “5 Big Data Predictions”, also in Forbes. (While both are the same blog, there might ne interesting comments in each)
  • Lots of interesting observations from Edd. He is predicting better programming language support, but may be we are looking at it the wrong way – what we need is a better stored procedure support in the data layer. It also could the next point Edd was talking about-Streaming data processing ! Where best could we have that feature than at the data layer ?
  • Would we be able to write a social science data platform using the MongoDB aggregators ? Would MongoDB mapReduce fit the bill now ? If not, what would it take to make it so ?
  • There are two obvious paths – connector to an application artifact for example Hadoop connector or embed the map/reduce in the data layer. Both have their advantages and disadvantages. With the connector the mapReduce can scale orthogonally, but with the embedded feature, one can achieve real-time processing (within limits). May be this is the time for an application data store !
  • Would the datastores like MongoDB gain features like the Twitter Storm, Real-Time map reduce, hierarchical iterative functional aggregators  and so forth ?
  • GreenPlum’s Chorus is interesting – Can NOSQL datastores gain some of the relevant capabilities that Chorus has?

Finally, the beginning as the end,

  1. Is hadoop the new stored procedure or would the new stored procedures look like Hadoop ?
  2. Is Data and Application becoming inseparable at scale ?
  3. What says thee?

Top 10 Steps to a Pragmatic Big Data Pipeline

As you know Big Data is capturing lots of press time. Which is good, but what does it mean to the person in the trenches ? Some thoughts … as a Top 10 List :

[update 11/25/11 : Copy of my The Art Of Big Data is at Slideshare]

10. Think of the data pipeline in multiple dimensions than a point technology & Evolve the pipeline with focus on all the aspects of the stages

  • While technologies are interesting, they do not work in insolation and neither should you think that way
  • Dimension 1 : Big Data (I had touched upon this in my earlier blog “What is Big Data anyway“) One should not only look at the Volume-Velocity-Variety-Variability but also at the Connectedness – Context dimensions.
  • Dimension 2 : stages – The degrees of separation as in collect, store, transform, model/reason & infer stages
  • Dimension 3 : technology – This is the discussion SQL vs. NOSQL, mapreduce vs Dryad, BI vs other forms et al
  • I have captured the dimensions in the picture. Did I succeed ? Let me know

9. Evolve incrementally focussing on the business values – stories to tell, inferences to derive, feature sets to influence & recommendations to make

Don’t get into the technologies & pipeline until there are valid business cases. The use cases are not hard to find, but they won’t come if you are caught up in the hype and forgrt to do the homework and due diligence …

8. Augment, not replace the current BI systems

Notice the comma (I am NOT saying “Augment not, Replace”!)

“Replace Teradata with Hadoop” is not a valid use case, given the current state of the technologies. No doubt Hadoop & NOSQL can add a lot of value, but make the case for co-existence leveraging currently installed technologies & skill set. Products like Hive also minimizes barrier to entry for folks who are familiar with SQL

7. Match the impedance of the use case with the technologies

The stack in my diagram earlier is not required for all cases:

  • for example if you want to leverage big data for a Product Metrics from logs in Splunk, you might only need a modest hadoop infrastructure plus an interface to existing dashboard plus Hive for analysts who want to perform analytics
  • But if you want Behavioral Analytics with A/B testing with a 10min latency, a full fledged Big Data infrastructure with say hadoop, HDFS, HBase plus some modeling interfaces, would be appropriate
  • I had written an earlier blog about the Hadoop infrastructure as a function of the degrees of separation from the analytics end point

6. Don’t be afraid to jump the chasm when the time is appropriate

Big Data systems have a critical mass at each stage – that means lots of storage or may be a few fast machines for analytics, depending on the proposed project. If you have done your homework from a business and technology perspective, and have proven your chops with effective projects on a modest budget, this would be a good time to make your move for a higher budget. And when the time is right, be ready to get the support for a dramatic increase & make the move …

5. Trust But Verify

True for working with Teenagers, arms treaty between superpowers, a card game, and more closer to our discussion, Big Data Analytics. In fact, one of the core competency of a Data Scientist is a healthy dose of skepticism - said John Rauser [here & here] . I would add that as you rely more and more inferences to a big data infrastructure across the stages, make sure there are checks and balances, independent verification of some of the stuff the big data is telling you.

Another side note in the same line is the oscillation – as the feedback rate, volume and velocity increases there is also a tendency to overreact. Don’t equate the feedback velocity to the response velocity – for example don’t change your product feature set based on high velocity big data based product metrics, at a faster rate than the users can consume. Have a healthy respect for the cycles involved. For example I came across an article that talks about fast & slow big data – interesting. OTOH, be ready to make dramatic changes when you get faster feedbacks that indicate things are not working well, for whatever reason.

4.   Morph from Reactive to Predictive & Adaptive Analytics, thus simplifying and leveraging the power of Big Data

As I was writing this blog, came across Vinod Khosla’s speech at Nasscom meeting. A must read – here & here. His #1 and #2 in ‘cool dozen’ were about Big Data! The ability to infer the essentials from an onslaught of data is in fact the core of a big data infrastructure. Always make sure you can make a few fundamental succinct inferences that matter, out of your infrastructure. In short deliver “actionable” …

3. Pay attention to How and the Who

Edd wrote about this in Google+. Traditional IT builds the infrastructure for Collect and Store stages in a Big Data Pipeline. It also builds and maintains infrastructure for analytics processing, like Hadoop and visualization layer like Tableau. But the realm of Analyze,Model, Reason and the rest, requires a business view, which a Data Analyst or a Data Scientist would provide. Pontifying further, it makes sense for IT to move in this direction by providing a ‘landing zone’ for the business savvy Data Scientists & Analysts and thus lead the new way of thinking about computing, computing resources and talents …

<This is WIP. Am collecting the thoughts and working thru the list – delibeately keeping two slots (and may be one more to make a baker’s dozen!, pl bear with me … But I know how it ends ;o)>

1. And, develop the Art of Data Science

As Ed points out in Google+, big data is also about exploration and the art of Data Science is an essential element. IMHO this involves more work in the contextual, modeling and inference space, with R and so forth – resulting in new insights, new products, order of magnitude performance, new customer base et al.  While this stage is effortless and obvious in some domains, it is not that easy in others …

BigData Counts

During work and play, many times one has to make ‘back of the envelope’ calculations. Sometimes it is hard to get a perspective on scale and various aspects of big data – numbers like millions and billions or even GB/TB/PB et al. So I have started collecting a few representative numbers that can add perspective to any calculation. Please suggest more …

Seconds

  • In a day ~86000 or even 100,000 or ~10^5
  • In a month ~ 2.5 * 10^6
  • In an year ~ 30 * 10^6

Tweets per day

Visits per week

  • Google + -> Max 15 Million visits per week, stready state ~ 6 million per week [Ref Link]

Analytics Data Estimates

  • eBay adds 50 TB/day [Link to my HPTS 2011 blog]
  • facebook adds 15 TB/day to it’s Hadoop infrastructure[Link]
  • facebook messaging growing @250TB/month (Oct 2011)
  • facebook messages numbers

Total Data

  • facebook messaging – 6 PB (without compression)/ 2 PB (LZO compressed)
  • eBay Analytics
  • 40 Nodes/260 TB (Now Nov 2011)
  • Adding 20 nodes to 800 TB
  • Next Quarter 80 nodes ~1PB

Largest

  • Teradata – (eBay) 84 PB capacity, 250 nodes
  • Cassandra (Netflix?) 300 TB in 400 nodes

Object Stores

  • [2008] 40 Billion objects
  • [2011] 600 Billion objects
  • Salesforce
  • [2011] ~30 Billion

Storage & Network Infrastructure

  • 1 Gig ethernet saturates at ~122 MB/sec
  • 4 U filer can get ~40 TB with 24 spindles, 2 TB disks and Raid 6
  • 1,359,804 – Total number of EC2 Public IP, calculated from the allocated ranges (Tweet from Jeff Barr)

Updates

From HadoopWorld2011:

Facebook’s HBase cluster has over 1 PB of storage, and they lose a hard drive every 30 minutes. (Thanks to )

In 2009, Yahoo lost 19 blocks out of 329M on their 20k Hadoop clusters. That’s 7 9s availability (Thanks to @herberts)

  • Please suggest more numbers that you use or found helpful …

Big Data & NOSQL Nirvana : HPTS 2011 Day 1

This week I am attending the biennial High Performance Transaction Systems Workshop – HPTS 2011 (Agenda). I was expecting exciting discussions, insightful wisdom and overall a stimulating company – was not disappointed.

IMHO, the highlights till Day 1½ (Stardate -311188.01369863015) were the NOSQL & Big Data discussions by Netflix (Adrian), Facebook(Kannan), eBay(Tom Faster) & Microsoft(Ed Harris). There were other good presentations (James Hamilton/Amazon, Ike Nassi/SAP, Charles Lamb/Oracle,…) which I will discuss in another blog.

Highlights:

  • We heard about Netflix use of aws, FaceBook Messaging Infrastructure, eBay Analytics Platform & Microsoft OSD (On-line Services Division)
  • Facebook Message Infrastructure:
    • 6 B+ Messages/day
    • Average write 16 records across multiple column families
    • 2+PB LZO compressed (6+PB uncompressed) in HBase
    • Growing 250 TB/Month
  • eBy Analytics
    • >100 PB
    • >50TB new data/day
    • eBay sacrificed concurrency for capacity & speed with Teradata
    • They can do a full table scan across PB of data in 32s !
  • eBay has a private network across  the Vegas and Phoenix datacenter with 20-40GB bandwidth
    • Each datacenter has the full Teradata, Singularity, Hadoop stack
  • The NOSQL datastore, the deployment topology, DR and the HA practices reflect the  path chosen by the respective companies
    • Netflix wanted cross DC replication with Availability as the main criteria. So they are using Cassandra
    • Facebook has a cell architecture with users sharded to one cell; their goal was strong consistency, automatic failover, MapReduce and so forth. So they chose HBase
    • eBay has a very systemic was of looking at the continuum as Structured, Semi Structured and Unstructured.
      • Structured-Analyze & Report (6PB data, compressed to 1.6PB))
      • Unstructured – Discover & Explore (20 PB data)
      • Semi-structured – both in some mixture (40 PB data)
      • So they chose Teradata for structured & Hadoop for unstructured
    • Microsoft is using Dryad and a declarative layer called SCOPE on a virtual cluster architecture for their analytics platform called Cosmos
  • Netflix is ~100% cloud-based
  • eBay has the largest Teradata installation – 256 active nodes w/ a capacity of 84PB & the 3rd largest Hadoop Installation!
  • Most probably Netflix is among the top three largest Cassandra Installation
  • Largest Cassandra installation (known) is 400 nodes, 300TB (I have a strong doubt it is Netflix!)
  • Oracle NOSQL is CA (w.r.t CAP) because it is on top of BDB.
    • The consensus is that AP or CP is more interesting from a NOSQL prespective
  • The basic sources of behavioral analytics are (Microsoft):
    • Web Pages,
    • Search Log,
    • Browser Log &
    • Advertisement Log
    • Connected,Contextual Big Data as I had written before

Gory Details (a.k.a Guided tour through the slides):

  • This section is WIP. The presentations are not yet up. I will point to the presentations when they are available
  • No SQL Eco System <- Good slides with a couple of good observations
  • Storage Infrastructure for Facebook messages [Slides]
    • Slide #3 – Why they cose HBase is interesting
    • Slide #11 – Shadow Testing strategy is informative. Testing at scale is always a challenge
    • Slide #28 – Scares & Scars – a must read
  • Some slides to study (i will point them out in the set as the presentations are on-line)
    • The Netfllix Cloud backup & DR topology covering all failure scenarions – even aws account malfunction ! (Hint: There is a Read-Only copy in S3 with a different account!)
      • They have done what I call “Design a control plane for failure and tune the data for normal ops.” in one of my blogs
    • eBay’s table structure that has characteristics of SQL & NOSQL
    • examples of Path Analysis extension to Teradata by eBay
    • eBay’s Platform Metrics comparison of Teradata, Singularity and Hadoop
      • While Hadoop has some good qualities, it also consumes more resources than Teradata
  • Many more …
  • Facebook presentation notes from James Hamilton
  • Microsoft Cosmos notes from James Hamilton

And Finally Some interesting Remarks:

  • The number of options for (NOSQL) persistence doubles every 1.5 years
  • If it is not in memory, it is not data
  • Analytics – combine data in surprising ways (Microsoft)
  • Datacenter exothermic incident by running analytics applications which run at 85% CPU (Microsoft)
  • The One way or another we all are part of some experiments - A/B Testing or analytics based preference and all our actions end up in one of these platforms !
  • Believe it or not, even I ended up presenting on Precision Time Synchronization on Day 1! Last minute fill-in, Thanks to Pranta. In Hadoop terms “speculative execution” !
  • The Jobs Logs : An ode to an icon

    My heart aches, and a drowsy numbness pains
    My sense, as though of hemlock I had drunk,
    Or emptied some dull opiate to the drains
    One minute past, and Lethe-wards had sunk:

    Steven Paul Jobs is no more … Hard to believe, harder to accept and almost impossible to imagine a future with out him … Instead of lamenting I decided to draw inspiration …

    For me his stanford speech aptly titled “How to live before you die” personifies Steve

    It is fun to be a pirate than join the Navy“! and other quotes from this Slideshare

    Work Hard to make simple” and other quotes from Marko Saric’s blog

    Huffington Post got it right when it quoted Steve – “… focus and simplicity. Simple can be harder than complex …

    In many ways the essence of Steve is the very famous quote from Gizmodo and others:

    When you’re a carpenter making a beautiful chest of drawers, you’re not going to use a piece of plywood on the back, even though it faces the wall and nobody will ever see it. You’ll know it’s there, so you’re going to use a beautiful piece of wood on the back. For you to sleep well at night, the aesthetic, the quality, has to be carried all the way through.—Playboy, 1987

    On a technology level, “Jobs refused to accept that software and hardware were best designed and engineered separately. For him, the venerable insight summarized by Thomas Hughes, the grand historian of American technology, as ‘the system must be first’ …” – A good POV in IEEE spectrum by Pascal

    [Update 11/7/11] Malcom Gladwell’s review in The Tweaker in The New Yorker is an excellent read. ‘Jobs’s sensibility was more editorial than inventive. “I’ll know it when I see it,” …’.

    “Was Steve Jobs a Samuel Crompton (inventor) or was he a Richard Roberts (the tinkerer)?” asks Malcom …

    As Washington Post saysHe (Jobs) would be getting off here; we were to proceed without him into the unknown. Let it go and look ahead was the message all along.

    Other sources of good Jobs quotes: 

    Ref:

    Book Review – In the Plex : How Google Thinks, works and shapes our lives

    Prelude:

    I liked the book a lot, it reads like a thriller- at least to me. I couldn’t put it down and was reading the book late night, during work days – to the chagrin of the family !

    Stephen Levy has clearly chronicled Google’s ascend and the tribulations it encountered – internal and external, on the way. What is more interesting is the fact that he has written a set of very crisp & detailed explanation of the innovations that Google brought into the search & advertisement domains.

    I agree with Stephen that Google is a “clever internet-startup-named-after-a-100-digit-number turned into a corporate phenomenon”. It is very interesting to read it’s agony to IPO (and the ecstasy of the investors!) If Google had it’s way it would have added a requirement of min SAT score (and a Stanford PhD – at least an MMDS Certificate) for buying it’s shares ! Am forced to quote Scott Reeves (Forbes Aug 2004) on Google’s targeted price of $108/$135 “Only those who were dropped on their head at birth [will] plunk down that kind of cash for an IPO” – ouch ! (I myself was ready for around $50)

    Google – A sum of it’s Obsessions

    Search (Of course!)

    • PageRank, of course, refers to Larry Page’s Ranking Algorithm ! The PageRank estimates the importance of a page by the web pages that link to it. “We convert the entire web into a big equation with several hundred million variables”
    • The concept of signals – viz factors like terms, capitalization, font size, position et al – as traits added with PageRank is the secret sauce that made Google’s search very effective.
    • The search engines get major and minor rewrites “like changing the components of a flying plane – without the passengers knowing about it, but the ride becomes more comfortable and they get there faster “ not a perfect analogy but an effective simile!
    • The engineers fret about any queries that do not get answered in the first page – in many ways clicking next page in a search result is a failure of the brilliant engineers behind the search engine. You have to read about the query “Audrey Fino” that vexed Amit Singhal Google’s chief of search engine. The search showed lots of Audrey Hepburn and that bothered Amit – “There’s a person somewhere names Audrey Fino and we didn’t have the smarts in the system to know this” and the remedy was of course – to state Stephen,  a multi-year name detection and name classifier “algorithmic therapy” with a dash of “bigram breakage” added to taste !
    • Rokc is rock unless it has little in front of it (when it becomes the capital of a state) or if preceded by Noah becomes ark ! Another such query was “Eika Kerzen” which requires translation (to German in this case) to get to the right search result.

    Algorithmic purity & ubiquitous

    • Google is an algorithmic company driven by computer science ! We can see that everywhere – successes and failure. For example the number of shares at IPO (2,718,281,828) is the Napier’s constant e ! During the bidding of patents, Google was bidding numbers like pi for Nortel’s patents
    • Even the Google ad sales people consider themselves as mediators between madison avenue and algorithms – only Google can say with both the words in the same sentence, make it sensible and in the process create an industry where it makes billions of dollars – as one SEO chief puts it “It is not we want to put all our boxes in one basket,  but there is only one basket in the industry”
    • The great lengths the Google team would go to make search relevant is exemplified by the “running shoes gnome sculpture”. The engineers believed in algorithmic purity – and before the launch of the Froogle product search, “running shoes” would show a “garden gnome sculpture that happened to wear sneakers”. The team cannot ship a product that fails to differentiate between a lawn art and a footwear. It seems within a couple of days the offending link disappeared ! And the team learned that one of their teammates went ahead and bought the one-of-a-kind sculpture that taking it off the web site !   “The algorithm started showing the right results, … and we launched!”
    • Search algorithmics sometimes had very strange effects – like showing the now defunct main office of bell telephone for a query “weather.com Philadelphia” – reason being the telephone company used to tell weather over the phone and this factoid was unearthed by the search algorithm !
    • It is interesting to read how Google re-invented the bidding system “Vickery second-bid action system” because the engineer (Eric Veach) wanted to avoid the “bid shading”. In the end, like anything else that Google touches, they created an innovative system that combined a few factors like bidding and ad-positioning, adding competition & customer satisfaction, in the end creating a rolling revenue stream in the order of billions of dollars for Google  – all in all a nifty feat!
    • The concept of compressing data to understand it was a brilliant stroke – the Google project called Phil (Probabilistic Hierarchical Inferential Learner) resulted in understanding the essence of web pages and …. Contextual matching ads with the web page’s content service called “Google content-targeted advertising” which later became AdSense (after acquiring the company Applied Semantics!)

    Scale

    • Their success of algorithms (gave Eigen vector some credence) and  the change of scale that came with that was what made Google Google ! As Luiz Barrozzo observed “There are programs that do not run on anything smaller than a 1000 machines,  which means you are looking at a datacenter as a computer “
    • Google affects whatever it touches in unpredictable ways – for example, Google’s racks maxed out (power & cooling) at Exodus that Exodus drove an 18-wheeler upto the colo, punched 3 holes in the wall and pumped cold air into Google’s cage through PVC pipes!

    The movers

    •  As I was reading the book, there were a few people I knew who played prominent roles in Google – was wondering when Hal Varian would show up – he did in (P.116) and stayed relevant in a lot of pages with his team of “econometricians” cross between statisticians and economists !
    • Was wondering when Sundar Pichai would show up, he did (P.205) and remained relevant as Steven narrated eloquently the advent of Google Chrome and the JavScript engine V8 … leveraging Google’s insistence on speed …
    • Stephen has interviewed most of, if not all, the technology leaders and we get to meet them at the relevant topics.

    Trivia:

    • I think building 40 is called Building 0 or Nullplex. It is interesting as I work nextdoor – the only non-google building among the sea of bicycle trotting Googlers !
    • Pages Law according to Brin – “Every 18 months, software becomes twice as slow” !
    • Danger, which Andy Rubin cofounded, moved into the Palo Alto office when Google moved out of it in 1999 ! Eventually he left Danger and started Android …
    • Google always was structured like a PhD program dorm in a university – as Andy Rubin puts it “There is an implied grading on a 4.0 scale of the questions during interview and anybody less than 3.0 is rejected; the GPS (Google Product Strategy) meetings are run like a PhD defenses”
    • As told by Alan Eustace to Andy Rubin “Google’s brain is like a baby’s – an omnivorous sponge that was always getting smarter from the information it soaked up”!
    •  “We want Google to be that third half of your brain – Sergey, P.386
    • “It’s quite amazing how the horizon of impossibility is drifting these days” Thurun
    • The locus and trajectory of Google –“put Google in the driver ‘s seat on many decisions – large and small – that people make the course of a day and their lives![P.68]

    Epilogue:

    • In this review I touched only a minimal set of interesting points (interesting to me!). The book has a lot of good read from Google’s China syndrome to how the Googlers shaped the last presidential election and later worked for the Obama administration to the controversies like Goggle view and the struggle with digitizing books.
    • One important development that Stephen couldn’t include, due to the timing of the release of the book, was Google+. But don’t despair – Stephen has written that part of the story as an article in wired ! Best to read it after finishing the book.
    • Readwriteweb has an article on the data scientist behind Google+
    • And Stephen’s blog on Motorolla Mobility purchase is another good read, again an important step by Google.
    • I just now saw a write up by infoworld on Google’s 5 biggest hits and misses.
    • Next book on my reading list “I’m Feeling Lucky: The Confessions of Google Employee Number 59″ by Douglas Edward; it is on hold 3 of 7 from San Jose public library.
    Follow

    Get every new post delivered to your Inbox.