During work and play, many times one has to make ‘back of the envelope’ calculations. Sometimes it is hard to get a perspective on scale and various aspects of big data – numbers like millions and billions or even GB/TB/PB et al. So I have started collecting a few representative numbers that can add perspective to any calculation. Please suggest more …
PetaByte vs ExaBytes vs ZettaByte vs YoottaBytes
Good examples from Todd at High Scalability
Seconds
- In a day ~86000 or even 100,000 or ~10^5
- In a month ~ 2.5 * 10^6
- In an year ~ 30 * 10^6
Tweets per day & Tweets Per Second (TPS)
[June 2011] 200 Million tweets / day- Record :
- [Feb 2011] 4,000/second at Super Bowl
- [June 2011] 7,000/second soccer world cup final
- [October 2011]10,000 / second
- [Feb 2012] 12,233 TPS Max, 10,000/second during last 3 minutes of SuperBowl XLVI
- [July 2012] 15,358 TPS during 4th Goal of Euro2012 soccer Spain Over Italy
Visits per week
- Google + -> Max 15 Million visits per week, stready state ~ 6 million per week [Ref Link]
Analytics Data Estimates

- eBay adds 50 TB/day [Link to my HPTS 2011 blog]
- facebook adds 15 TB/day to it’s Hadoop infrastructure[Link]
- facebook messaging growing @250TB/month (Oct 2011)
- facebook messages numbers
- [April 2011] 135 Billion messages / Month [Here & Here]
- [Oct 2011] 6 Billion / Day, 180 B / Month [Link to my HPTS 2011 blog]
- 250 TB / month
Total Data
- Structured Data store (Teradata) 6 PB (1.5 compressed)
- Semi Structured – 40 PB
- Unstructured (Hadoop) 20 PB
- CBS Interactive (CBS, CBS Sports, ZDnet, CNet, TechRepublic, et al)
- 40 Nodes/260 TB (Now Nov 2011)
- Adding 20 nodes to 800 TB
- Next Quarter 80 nodes ~1PB
Largest
- Teradata – (eBay) 84 PB capacity, 250 nodes
- Cassandra (Netflix?) 300 TB in 400 nodes
Object Stores
- Amazon S3 (Link)
- [2008] 40 Billion objects
- [2011] 600 Billion objects
- [Update Q4,2011] 762 Billion [Ref: here]
- Salesforce
- [2011] ~30 Billion
Storage & Network Infrastructure
- 1 Gig ethernet saturates at ~122 MB/sec
- 4 U filer can get ~40 TB with 24 spindles, 2 TB disks and Raid 6
- 1,359,804 – Total number of EC2 Public IP, calculated from the allocated ranges (Tweet from Jeff Barr)
- …
Updates
From HadoopWorld2011:
Facebook’s HBase cluster has over 1 PB of storage, and they lose a hard drive every 30 minutes. (Thanks to )
In 2009, Yahoo lost 19 blocks out of 329M on their 20k Hadoop clusters. That’s 7 9s availability (Thanks to @herberts)
AWS – Estimated 454,400 servers and 1,720,246 IP Addressees
FB DW 45 PB of Data
eBay Market Place – 10PB Data, > 100 Million active users
- Please suggest more numbers that you use or found helpful …

Pingback: Top 10 Steps to a Pragmatic Big Data Pipeline « My missives
Google crawler stored about 850 TB data(refer: BigTable paper) in 2006. According to comScore’s survey google serves about 11.8 billion explicit core searches in Jan 2012