It is clear that there is a big data revolution going on – As Jeff Dean of Google pointed out, there are lots of big data sets. Moreover, the domain of machine learning systems are becoming more main stream. Naturally more and more computers and big data frameworks like Hadoop and NOSQL would have solved this problem except that the laws of data center energy physics are catching up. The gating factor is no more physical space but energy ! Electricity, cooling and all that sort of stuff !
I had just read about some work going on at John Hopkins in this area [here] and [here]. I also remembered Facebook doing something along these lines months ago.
- The datascope work at JHU is interesting because Alex Szalay focuses on IOPS rather than FLOPS. As we all know, FLOPS have reached a ceiling [ an old blog of mine]
- Suleiman’s blogs have some very good points. In addition to the use of GPUs for a much higher computing power to electrical power ratio, what caught my attention was the fact that “… what is really required is a middleware runtime that provides similar software functionally that the ‘Big data’ community has come to love and associate with Hadoop MapReduce.”.
- A Big data analytic processing later that glues them together with essential control plane mechanisms & the right knobs and dials.
- I can see a few components :
- A submission framework,
- Intelligent work load allocation (scalable, loosely coupled & dynamic – might even be capable of learning and be adaptive!),
- Storage & search substrate,
- Lightweight Monitoring & Metering
- & may be a dash board portal of some sort.
- I think these are the little chips that could !
- Their specification is very appropriate – “Instances of this family provide general-purpose graphics processing units (GPUs) with proportionally high CPU and increased network performance for applications benefitting from highly parallelized processing …”.
- The instances are connected with low latency, high throughput 10Gb network,
- And have 22 GB memory,
- 33.5 EC2 Compute Units,
- 2 x NVIDIA Tesla “Fermi” M2050 GPUs,
- 1690 GB of local instance storage,
- 64-bit platform !
- Prices at $2.10/hr; only Linux is available.
- I saw somewhere that each is a single machine and not virtualized. Could be wrong.
- Jeff calls them nuclear-powered “bulldozers” (#18 Bryon)!
- 448 cores per GPU@ 550 gFLOPS gives a per instance of teraFLOPS, for $2.10 per hour. Lots of compute power. (BTW, the human brain is clocked at 100 teraFlops!)
- And the instances can run Elastic Map Reduce !
[Update 11/15/10] Good blog from James Hamilton at AMZ
[Update 11/15/10] Gartner observes “At the heart of the change in the next 20 years will be intelligence drawn from information,” Peter Sondergaard, senior vice president at Gartner and global head of Research, said. “Information will be the ‘oil of the 21st century’. It will be the resource running our economy in ways not possible in the past.” – which also means that we need systems and frameworks that process tons of information that is connected, contextual, …