In our lab, we are working on a few ideas on Big Data, Cloud and Dataset Storage Infrastructure …. I was working on a slice of it, this weekend …
- First buy four good sized machines (C200-M2 with 48 GB RAM & dual quad core Intel E5520,2.26 MHz)
- Add dual port 10Gb hardware iSCSI-4 w/ToE (TCP Offload Engine) Card (Broadcom 57711)
- In short iSCSI and TCP in hardware!
- Add SAS mezanine Card (LSI 1064E – 4 Port SAS)
- The routing of the SAS cables is a little tricky …
- Open Up the case & take out the baffles
- Now comes the hard part – disconnect the short SATA cable, connect the long SAS cable to the LSI 1064E card & install the card
- Then route the cable properly, connect & tie the cables
- Looks good, except for the black tie wrap
- Baffles back & the new machine ready to rumble HDFS – fast & furious !
Viola – you have the satisfaction of hacking together a set of mean storage machines (4 X 8 TB) that can host HDFS in a cloud, either as pure Data Nodes (over iSCSI) or as Data/Task Nodes (w/local storage)
And off to the proving ground …
I finished building 4 machines, installed Ubuntu 10.10, assigned IP addresses for the Integrated Management Controller and the data Ethernet ports, checked everything (one port not working,…) and ready for Hadoop tomorrow — all in all good work for a long weekend … !
And then to the C3L Lab … which has actually doubled since then …
For those inquiring minds, the architecture is:
- Mean compute blades – doesn’t matter bare metal or virtualized, but definitely in an elastic infrastructure
- W/ Top Of Rack intermediate nodes hanging off of hardware assisted storage/network (iSCSI/ToE)
- Thus decoupling the storage & compute to form a Hadoop Cloud
- This is still IaaS as we also need a Hadoop Cloud framework for a multi-tenant PaaS
I will post our bench marks – on multiple dimensions (we have built an unified monitoring system using Netflow, Ganglia, UCS and Hadoop monitoring to qualify and quantify across these dimensions) :
- virtualized vs. baremetal
- virtualized with VIC cards
- different application archetypes – I/O bound vs memory hogs vs. hybrid