The Art of a HDFS node

In our lab, we are working on a few ideas on Big Data, Cloud and Dataset Storage Infrastructure …. I was working on a slice of it, this weekend …

  • First buy four good sized machines (C200-M2 with 48 GB RAM & dual quad core Intel E5520,2.26 MHz)
  • Add dual port 1 Gb NIC (Broadcom 5709)
  • Add dual port 10Gb hardware iSCSI-4 w/ToE (TCP Offload Engine) Card (Broadcom 57711)
    • In short iSCSI and TCP in hardware!
  • Add SAS mezanine Card (LSI 1064E – 4 Port SAS)
    • The routing of the SAS cables is a little tricky …
    • Open Up the case & take out the baffles
    • Now comes the hard part – disconnect the short SATA cable, connect the long SAS cable to the LSI 1064E card & install the card
    • Then route the cable properly, connect & tie the cables
    • Looks good, except for the black tie wrap
    • Baffles back & the new machine ready to rumble HDFS – fast & furious !
  • And finally add 4 X 2 TB Seagate ST32000444SSS Constallation ES disks
  • Close & move on to next …

Viola – you have the satisfaction of hacking together a set of mean storage machines (4 X 8 TB) that can host HDFS in a cloud, either as pure Data Nodes (over iSCSI) or as Data/Task Nodes (w/local storage)

And off to the proving ground …

I finished building 4 machines, installed Ubuntu 10.10, assigned IP addresses for the Integrated Management Controller and the data Ethernet ports, checked everything (one port not working,…)  and ready for Hadoop tomorrow — all in all good work for a long weekend … !

And then to the C3L Lab … which has actually doubled since then …


For those inquiring minds, the architecture is:

  • Mean compute blades – doesn’t matter bare metal or virtualized, but definitely in an elastic infrastructure
  • W/ Top Of Rack intermediate nodes hanging off of hardware assisted storage/network (iSCSI/ToE)
  • Thus decoupling the storage & compute to form a Hadoop Cloud
  • This is still IaaS as we also need a Hadoop Cloud framework for a multi-tenant PaaS

I will post our bench marks – on multiple dimensions (we have built an unified monitoring system using Netflow, Ganglia, UCS and Hadoop monitoring to qualify and quantify across these dimensions)    :

  • virtualized vs. baremetal
  • local/iSCSI/FC/FCoE
  • virtualized with VIC cards
  • different application archetypes – I/O bound vs memory hogs vs. hybrid

5 thoughts on “The Art of a HDFS node

  1. Pingback: Six Degrees of Hadoop Hardware « My missives

  2. Interesting article. Just purchased two C200M2 to try something similar. Do you have the reference of the SAS cable you used, and what lenght of cable did you need ?

    Was it difficult to get the Seagate drives installed with the disk carriages. You didn’t need to buy new carriages from Cisco ?

    • Erik,
      Unfortunately you need to get the disks from Cisco. You might be able to mount the seagate drives, but if I remember correctly, the thin light bar is missing as well a custom adaptor. The LSI card comes with the required cable. The cable is a single connector (at the card end) with multiple connectors for the disks as well as to the lights at the other end.
      Cheers & good luck. The C2XXs are good machines with space for lots of memory.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s