BigData Counts


During work and play, many times one has to make ‘back of the envelope’ calculations. Sometimes it is hard to get a perspective on scale and various aspects of big data – numbers like millions and billions or even GB/TB/PB et al. So I have started collecting a few representative numbers that can add perspective to any calculation. Please suggest more …

PetaByte vs ExaBytes vs ZettaByte vs YoottaBytes

Good examples from Todd at High Scalability

Seconds

  • In a day ~86000 or even 100,000 or ~10^5
  • In a month ~ 2.5 * 10^6
  • In an year ~ 30 * 10^6

Tweets per day & Tweets Per Second (TPS)

  • [June 2011] 200 Million tweets / day
  • Record :
  • [Feb 2011] 4,000/second at Super Bowl
  • [June 2011] 7,000/second soccer world cup final
  • [October 2011]10,000 / second
  • [Feb 2012] 12,233 TPS Max, 10,000/second during last 3 minutes of SuperBowl XLVI
  • [July 2012] 15,358 TPS during 4th Goal of Euro2012 soccer Spain Over Italy

Visits per week

  • Google + -> Max 15 Million visits per week, stready state ~ 6 million per week [Ref Link]

Analytics Data Estimates

  • eBay adds 50 TB/day [Link to my HPTS 2011 blog]
  • facebook adds 15 TB/day to it’s Hadoop infrastructure[Link]
  • facebook messaging growing @250TB/month (Oct 2011)
  • facebook messages numbers

Total Data

  • facebook messaging – 6 PB (without compression)/ 2 PB (LZO compressed)
  • eBay Analytics
  • 40 Nodes/260 TB (Now Nov 2011)
  • Adding 20 nodes to 800 TB
  • Next Quarter 80 nodes ~1PB

Largest

  • Teradata – (eBay) 84 PB capacity, 250 nodes
  • Cassandra (Netflix?) 300 TB in 400 nodes

Object Stores

  • [2008] 40 Billion objects
  • [2011] 600 Billion objects
  • [Update Q4,2011] 762 Billion [Ref: here]
  • Salesforce
  • [2011] ~30 Billion

Storage & Network Infrastructure

  • 1 Gig ethernet saturates at ~122 MB/sec
  • 4 U filer can get ~40 TB with 24 spindles, 2 TB disks and Raid 6
  • 1,359,804 – Total number of EC2 Public IP, calculated from the allocated ranges (Tweet from Jeff Barr)

Updates

From HadoopWorld2011:

Facebook’s HBase cluster has over 1 PB of storage, and they lose a hard drive every 30 minutes. (Thanks to )

In 2009, Yahoo lost 19 blocks out of 329M on their 20k Hadoop clusters. That’s 7 9s availability (Thanks to @herberts)

AWS – Estimated 454,400 servers and 1,720,246 IP Addressees

FB DW 45 PB of Data

eBay Market Place – 10PB Data, > 100 Million active users

  • Please suggest more numbers that you use or found helpful …
About these ads

2 thoughts on “BigData Counts

  1. Pingback: Top 10 Steps to a Pragmatic Big Data Pipeline « My missives

  2. Google crawler stored about 850 TB data(refer: BigTable paper) in 2006. According to comScore’s survey google serves about 11.8 billion explicit core searches in Jan 2012

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s