In our lab, we are working on a few ideas on Big Data, Cloud and Dataset Storage Infrastructure …. I was working on a slice of it, this weekend …
- First buy four good sized machines (C200-M2 with 48 GB RAM & dual quad core Intel E5520,2.26 MHz)
- Add dual port 10Gb hardware iSCSI-4 w/ToE (TCP Offload Engine) Card (Broadcom 57711)
- In short iSCSI and TCP in hardware!
- Add SAS mezanine Card (LSI 1064E – 4 Port SAS)
- The routing of the SAS cables is a little tricky …
- Open Up the case & take out the baffles
-

- Now comes the hard part – disconnect the short SATA cable, connect the long SAS cable to the LSI 1064E card & install the card
- Then route the cable properly, connect & tie the cables

- Looks good, except for the black tie wrap
- Baffles back & the new machine ready to rumble HDFS – fast & furious !

Viola – you have the satisfaction of hacking together a set of mean storage machines (4 X 8 TB) that can host HDFS in a cloud, either as pure Data Nodes (over iSCSI) or as Data/Task Nodes (w/local storage)
And off to the proving ground …
I finished building 4 machines, installed Ubuntu 10.10, assigned IP addresses for the Integrated Management Controller and the data Ethernet ports, checked everything (one port not working,…) and ready for Hadoop tomorrow — all in all good work for a long weekend … !
And then to the C3L Lab … which has actually doubled since then …
Note:
For those inquiring minds, the architecture is:
- Mean compute blades – doesn’t matter bare metal or virtualized, but definitely in an elastic infrastructure
- W/ Top Of Rack intermediate nodes hanging off of hardware assisted storage/network (iSCSI/ToE)
- Thus decoupling the storage & compute to form a Hadoop Cloud
- This is still IaaS as we also need a Hadoop Cloud framework for a multi-tenant PaaS
I will post our bench marks – on multiple dimensions (we have built an unified monitoring system using Netflow, Ganglia, UCS and Hadoop monitoring to qualify and quantify across these dimensions) :
- virtualized vs. baremetal
- local/iSCSI/FC/FCoE
- virtualized with VIC cards
- different application archetypes – I/O bound vs memory hogs vs. hybrid



Pingback: Six Degrees of Hadoop Hardware « My missives
Interesting article. Just purchased two C200M2 to try something similar. Do you have the reference of the SAS cable you used, and what lenght of cable did you need ?
Was it difficult to get the Seagate drives installed with the disk carriages. You didn’t need to buy new carriages from Cisco ?
Erik,
Unfortunately you need to get the disks from Cisco. You might be able to mount the seagate drives, but if I remember correctly, the thin light bar is missing as well a custom adaptor. The LSI card comes with the required cable. The cable is a single connector (at the card end) with multiple connectors for the disks as well as to the lights at the other end.
Cheers & good luck. The C2XXs are good machines with space for lots of memory.